I have this dataset with 2 variables: week and brand_chosen, where brand chosen designates which product from e.g. a super market was chosen, an it looks like this.
Week brand_chosen
2 19
2 15
2 50
2 12
3 19
3 16
3 50
4 77
4 19
What I am trying to do is for each line, to note the week in which the brand purchase was made, and check if in the week before that the same brand purchase was made. In case it did, a variable dummy would take the value of 1, otherwise 0.
Because week appears multiple times I cannot take just the lag(week,1), so I probably need to loop through the week variables for each case, until it finds the first different value.
This is what i tried to do
loop i=1 to 70.
do if (week<>lag(week,i) and brand_chosen=lag(brand_chosen,i)).
compute dummy=1.
end loop.
else.
compute dummy=0.
end if.
end loop.
execute.
Where 70 is just an arbitrary number so that I am sure that it will check all the previous cases.
I get two problems with that. First the lag function needs to contain a number from what I understand but "i" is not considered a number here.
The second problem is that i would like to close the loop if the condition is satisfied, and move to the next case but I get an error.
I am new to spss syntax and I am struggling with that one, so any help is greatly appreciated.
I assume that every combination of week--brand_chosen is unique. In this case the solution is quite simple. Just reorder your dataset by brand_chosen and then week, and then run a simple lag command.
This should do the trick:
SORT CASES BY brand_chosen week.
COMPUTE dummy=0.
IF (brand_chosen=LAG(brand_chosen) AND week>LAG(week)) dummy = 1.
Related
I have a dataset containing a number of persons who have been involved in an accident. Each person have been in an accident at a different time and I have coded a variable start_week which indicates what week number after a certain date (january 1st 2011), the accident occurred.
For each individual I also have a a variable for each week after january 1st 2011, that shows whether or not this individual has been hospitalized. I now need to count how many weeks a person has been hospitalized XX weeks after the accident.
The desired results should be a column like sum_week that sums number of weeks after the accident depending on the value shown in the variable start_week.
Id
start_week
week_1
week_2
week_3
week_4
sum_week
1
2
1
0
1
1
2
2
3
1
0
0
1
1
I think this can be done using an array, but I have no idea how. If it isn't possible to count across columns based on the variable start_week, I am planning on transposing my data. I would however prefer if this could be done without having to transpose my data.
Any help is much appreciated!
Just use the START_WEEK as the initial value in the DO loop you use to check the array.
data want;
set have ;
array week_[4];
sum_week=0;
do index=start_week to dim(week_);
sum_week+week_[index];
end;
drop index;
run;
I have a measurement that indicates the process of changing the state of smth. Every second the system asks whether smth is changing, if so it stores 1 in db, otherwise nothing. So I have sequences of "ones".
As here. Distance between points is 1s
I want to get only the time of first point of each "one" sequence. On this particular example it would be
Time Value
2019-01-01 11:46:55 1
2019-01-01 12:36:45 1
In red squares
Is there a way to do it using queries? Or may be easy python pattern?
P.S. first() selector requires GROUP BY, but I cannot assume that sequences are less then some_time_interval
You could probably achieve what you need with a nested query and the difference function. https://docs.influxdata.com/influxdb/v1.7/query_language/functions/#difference
For example:
Let's say your measurement is called change and the field is called value
SELECT diff FROM
(SELECT difference(value) as diff FROM
(SELECT value FROM change WHERE your_time_filter GROUP BY time(1s) fill(0))
)
WHERE diff > 0
So you first fill in the empty spaces with zeros. Then take the differences between subsequent values. Then pick the differences that are positive. If the difference is 0 then it is between two zeros or two ones, no change. If it is 1 then the change is from zero to one (what you need). If it is -1 then the change is from one to zero, end of sequence of ones.
You may need to tweak the query a bit with time intervals and groupings.
I have a simple daily rainfall data set and would like to calculate the antecedent dry period for each day. Here, I'm defining a dry day to be "<10". I'm fairly unfamiliar with INDEX(), MATCH(), and other fancy array functions but feel like I'll need to use them.
For example, in the image, for 1/17/2020, the values in cells C3:C9=0, C10=1, C11:C13=0. I've tried various versions of COUNTIF(), COUNTIFS(), and IF() functions but I cannot get the step-wise + re-set functionality necessary when extended "dry spells" or brief rain periods occur with gaps. Thanks!
You are right, you need to use Match. Basically you need to search for the next antecedent wet day (of which there are many here in Manchester England at the moment) and subtract 1 (Formula 1):
=MATCH(TRUE,INDEX(B15:B$1000>=10,0),0)-1
where B$1000 may need changing to include all of your data. The use of Index here is just a bit of a hack to avoid having to enter the formula as an array formula.
As you can see there is an issue when you come to the end of the range which I will come to in a minute.
In this case, we want to count the number of antecedent dry days to the end of the range (Formula 2):
=IFERROR(MATCH(TRUE,INDEX(B4:B$1000>=10,0),0)-1,COUNTIF(B4:B$1000,"<10"))
If the range ended with a dry spell, you would get this:
similar types of question are there on stackoverflow
but from the 24 hour format HH:MM ,what are the best ways to find the next earliest time
if 12:00 is given then the ans is 21:00
I think I’d generate all possible times from the digits of the original time (here 12:00), sort them and see which comes after the original one.
With more detail: Generate all unique permutations of the digits. If you don’t know how to generate permutations, search the web, it’s covered in many places. It’s probably easiest to generate all permutations and then filter out duplicates. Validate each permutation and discard those that are not valid times if any. For example there is no such time as 12:63 or 31:14. If using Java, use LocalTime.parse for validation. Sort the times using their natural order. Find the original in the sorted list. Return the subsequent list element.
Edit: The following description does not give you the correct next time in all cases.
It’s possible to go more directly for just finding the next time without finding all times, but it’s a bit complicated. First thing to do is to search the time string from the right for where there’s a smaller digit before a greater one (in your example you will find that 1 is before 2). Among the digits visited so far, choose the next digit higher than the smaller digit found. In your example, the next digit higher than 1 is 2. Put this digit where the lower digit was and reverse the order of the remaining digits. The reverse of 100 is 001, so your result is 20:01,, which I believe is the correct answer. For a different example, 01:20, again 1 is before 2, so put 2 there, reverse 10 into 01 to get 02:01. If you get an invalid time, repeat the process. If there is no smaller digit before a greater one, you have exhausted the possible times of the day. If you want to start over, reverse the entire string: from 21:00 you will get 00:12.
I have an array that looks something like this:
[["Sunday", [user1, user2]], ["Sunday", [user1, user4]], ["Monday", [user3, user2]]]
The array essentially has all permutations of a given day with a unique pair of users. I obtained it by running
%w[Su Mo Tu We Th Fr Sa].product(User.all_pairs)
where User.all_pairs is every unique pair of users.
My goal now is to compose this set of nested arrays into schedules, meaning I want to find every permutation of length 7 with unique days. In other words, I want every potential week. I already have every potential day, and I have every potential pair of users, now I just need to compose them.
I have a hunch that the Array.permutation method is what I need, but I'm not sure how I'd use it in this case. Or perhaps I should use Array.product?
If I understand you correctly, you want all possible weeks where there is one pair of users assigned to each day. You can do it like this:
User.all_pairs.combination(7)
This will give you all possible ways of how you can pick 7 pairs and assign them to the days of the week. But if you are asking for every possible week, then it also matters into which day is which pair assigned, and you also have to take every permutation of those 7 pairs:
User.all_pairs.combination(7).map{|week| week.permutation().to_a}.flatten(1)
Now this will give you all possible weeks, where every week is represented as array containing 7 pairs. For example one of the weeks may look like this:
[(user1, user2), (user1, user3), (user2, user3), (user3, user4), (user1, user4), (user2, user4), (user3, user4)]
However the amount of the weeks will be huge! If you have n users, you will have k = n!/2 pairs, there is p = k! / (7! * (k - 7)!) ways of selecting 7 pairs and p * 7! possible weeks. If you have just 5 users, you get 1946482876800 possible weeks! No matter what you are planning to do with it, it won't be possible.
If you are trying to find the best schedule for a week, you can try to make some greedy algorithm.