ODK-Making Sure That Values Equal To 100% - odk

I have three questions in a form I'm designing that's concerned with percentage values. Now, I need to make sure that the total of the three questions should not exceed 100. How do I do this? I'm currently using Kobo Form Designer.

You need to add a constraint expression to your prompt.
constraint="(/data/int1 + /data/int2 + /data/int3) <= 100"

Related

How do I subtract values in the SAME variable column in SPSS?

So I'm trying to create a new variable column of ''first differences'' by subtracting values in the SAME column but have no clue how to do so on SPSS. For example, in this picture:
1st value - 0 = 0 (obviously). 2nd value - 1st value =..., 3rd value - 2nd value =..., 4th value - 3rd value =... and so on.
Also, if there is a negative number, does SPSS allow me to log it/regress it? Once I find the first difference, I'm going to LOG it & then regress it. For context, the reason I'm doing this is part of a bigger equation to find out how economic growth and a CHANGE in economic growth (hence the first difference and log) will affect the variable im studying.
Thanks.
To calculate differences between values in consecutive rows use this:
if $casenum>1 diffs = FinalConsumExp - lag(FinalConsumExp).
execute.
If you need help with additional problems please start a separate question for each problem.
HTH.

I wish to put this condition in a more sargable way

In my sql server 2017 standard CU 20 there is a very often excuted query that has this awful condition:
AND F.progressive_invoice % #numberofservicesInstalled = #idService
Is there a mathematical way to put it in a more convenient way for sql server?
The query lasts from 240 to 500 ms
Can you help me to do better? Please
What do you think is particularly awful here?
Is this query performing badly? Are you sure, that this condition is responsible?
This % is the modulo operator.
SELECT 13 % 5 => Remainder is 3
--This is roughly the same your code is doing:
DECLARE #Divisor INT=5; --Switch this value
DECLARE #CompareRemainder INT=3;
SELECT CASE WHEN 13 % #Divisor = #CompareRemainder THEN 'Remainder matches variable' ELSE 'no match' END;
Your line of code will tell the engine to compute a integer division of F.progressive_invoice and the variable #numberofservicesInstalled, then pick the remainder. This computation's result is compared to the variable #idService.
As this computation must be done for each value, an index will not help here...
I do not think, that this can be put more sargable.
UPDATE
In a comment you suggest, that it might help to change the code behind the equal operator. No this will not help.
I tried to think of a senseful meaning of this... Is the number of a service (or - as the variable suggests - its id) somehow hidden in the invoice number?
execution plan and row estimation:
The engine will see, that this must be computed for all rows. It would help to enforce any other filter before you have to do this. But you do not show enough. The line of code we see is just one part of a condition...
Indexes and statistics will surely play their roles too...
The short answer to your direct question is no. You cannot rearrange this expression to make it a sargable predicate. At least, not with a construct which is finite and agnostic of the possible values of #numberofservicesinstalled.

How do I name columns that contain restricted characters so that that name is intuitive?

I have to make a database column that stores the number of people who scored greater than or equal to a threshold of 0.01, 0.1, 1, and 10 at a particular metric I'm tracking.
For example, I want to store the following data
Number units quality score >= 0.01
Number units quality score >= 0.1
Number units quality score >= 1
Number units quality score >= 10
So my question is...
How do I name my columns to communicate this? I can't use the >= symbol or the . charater. I was thinking something like...
gte_score_001
gte_score_01
gte_score_1
gte_score_10
Does this seem intuitive? Is there a better way to do this?
It's subjective, but I'd say score_gte_001 would be slightly more intuitive. meets_thresh_001 would be another option that may be slightly clearer than gte.
Then there's the numbers. Avoid the decimal point problem by refering to the numbers either explicitly or implicitly as hundreths:
meets_thresh_1c
meets_thresh_10c
meets_thresh_100c
meets_thresh_1000c
Your examples are all either integers or less than zero, so there isn't any ambiguity; but that itself might not be obvious to someone else looking at the name, and you could have to add more confusing or even conflicting groups. What would gte_score_15 mean - it could be read as >= 15 or >= 1.5? And you might need to represent both one day, so your naming should try to be future-proof as well an intuitive.
Including a delimiter to show where the decimal point goes would make it clearer, at least once you know the scheme. To me it makes sense to use the numeric format model character for the decimal separator, D:
gte_score_0d01
gte_score_0d1
gte_score_1
gte_score_1d5
gte_score_10
gte_score_15
though I agree with #L.ScottJohnson that score_gte_0d01 etc. scans better. Again, it's subjective.
If there is a maximum value for the metric, and a a maximum precision, it might be worth including leading zeros and trailing zeros. Say if it can never be more than two digits, and no more than two decimal places:
score_gte_00d01
score_gte_00d10
score_gte_01d00
score_gte_01d50
score_gte_10d00
score_gte_15d00
The delimiter is kind of redundant as long as you know the pattern - without it, the threshhold is the numeric part/100. But it's maybe clearer with it, and with the padding it's probably even more obvious what the d represents.
If you go down this route then I'd suggest you come up with a scheme that makes sense to you, then show it to colleagues and see if they can interpret it without any hints.
You could (and arguably should) normalise your design instead, into a separate table which has a column for the threshold (which can then be a simple number) and another column for the corresponding number of people for each metric. That makes it easier to add more thresholds later (easier to add extra rows an an extra column) and makes the problem go away. (You could add a view to pivot into this layout, but then you're back to your naming problem.)
Do you really want to do that? What if it turns out that you need additional threshold of 20. Or 75. Or ...? Why wouldn't you normalize it and use two columns? One that represents a threshold, and another one that represents the number of people.
So, instead of
create table some_table
(<some_columns_here>,
gte_score_001 number,
gte_score_01 number,
gte_score_1 number,
gte_score_10 number
);
use
create table some_table
(<some_columns_here>,
threshold number,
number_of_people number
);
comment on column some_table.number_of_people is
'Represents number of people who scored greater than or equal to a threshold';
and store values like
insert into some_table (threshold, number_of_people) values (0.01, 13);
insert into some_table (threshold, number_of_people) values (0.1, 56);
insert into some_table (threshold, number_of_people) values (1, 7);
If you have to implement a new threshold value, no problem - just insert it as
insert into some_table (threshold, number_of_people) values (75, 24);
Doing your way, you'd have to
alter table some_table add gte_score_75 number;
and - what's even worse - modify all other program units that reference that table (stored procedures, views, forms, reports ... the list is quite long).
Storing the number of people above a threshold will cause a problem no matter which of these solutions you use. Instead, use a view of the raw data. If you need a new split, you create a new view, you don't have to recalculate any columns. This example is not efficient, but is provided as an example:
create or replace view thresholds as
select
(select count(*) c from rawdata where score >= .1) as ".1"
, (select count(*) c from rawdata where score >= 1 ) as "1"
, (select count(*) c from rawdata where score >= 10 ) as "10"
from dual
Double quotes around the column name alias will let you get away with a lot of things. That removes most limitations on restricted characters and keywords.
I suggest something like this ought to work for display purposes.
SELECT 314 "Number units quality score >= 0.01" FROM DUAL

Why is this Formula for Alteryx returning 0's instead of averages

I was wondering what is wrong with the following formula.
IF [Age] = Null() THEN Average([Age]) ELSE [Age] ENDIF
What I am trying to do "If the cell is blank then fill the cell with the average of all other cells called [Age].
Many thanks all!
We do a lot of imputation to correct null values during our ETL process, and there are really two ways of accomplishing it.
The First Way: Imputation tool. You can use the "Imputation" tool in the Preparation category. In the tool options, select the fields you wish to impute, click the radio button for "Null" on Incoming Value to Replace, and then click the radio button for "Average" in the Replace With Value section. The advantages of using the tool directly are that it is much less complicated than the other way of doing it. The downsides are 1) if you are attempting to fix a large number of rows relative to machine specs it can be incredibly slow (much slower than the next way), and 2) it occasionally errors when we use it in our process without much explanation.
The Second Way: Calculate averages and use formulas. You can also use the "Summarize" tool in the Transform category to generate an average field for each column. After generating the averages, use the "Append" tool in the Join category to join them back into the stream. You will have the same average values for each row in your database. At that point, you can use the Formula tool as you attempted in your question. E.g.
IF [Age] = Null() THEN [Ave_Age] ELSE [Age] ENDIF
The second way is significantly faster to run for extremely large datasets (e.g. fixing possible nulls in a few dozen columns over 70 million rows), but is much more time intensive to set up and must be created for each column.
That is not the way the Average function works. You need to pass it the entire list of values, not just one.

Generate random 10 digit number as part of ID and ensure no duplicates

I have a request as part of an application I am working on to generate a UserId with the following criteria:
It will have a set 4 digit prefix
The next 10 digits will be random
A check-digit will be added on the end
Points 1 and 3 are straight-forward, but what would the best way be to generate a 10 digit random number whilst ensuring that it hasn't already been used.
I don't particularly like the idea of choosing one randomly, seeing if it has been taken and then either accepting or trying again.
My other thought was to generate a list of X numbers in advance (X being a number greater than the number of accounts you expect to be created) and just take the next one off the list as the accounts are created.
Any thoughts?
EDIT:
Lets bring technology into this. If I am using a SQL server database, is there a way I can make the database do this for me? E.g. enforce a unique constraint and get the database to generate the number?
Encryption. See this answer.
What you need is basically encrypt a sequential ID, thus producing a seemingly random number.
What's especially good about this is that you can do this all client-side, albeit in two consecutive transactions.
The only way to be 100% sure there is no other userid with the same random digits is to compare the gernerated to all existing users, you have to do it sometime
The only possibility is to do this code-side.
BUT you can save the random generated id to the userdatabase and compare this with your new key (in your code)
e.g. SELECT * FROM xy WHERE userid = 'newuserid'
If the result is null your key never was generated.
Thanks to Anton's answer I found this C# implementation of the skip32 encryption algorithm.
https://github.com/eleven41/Eleven41.Skip32
Using this I can pass in an incrementing database identity integer and get a nice random looking 9/10 digit number from it. So a simple call like
int result = cipher.Encrypt(databaseIdentity);
will give me the number I want. No need for duplicate checking as each will be unique.

Resources