What is the way to select only first value in InfluxDB sequences? - database

I have a measurement that indicates the process of changing the state of smth. Every second the system asks whether smth is changing, if so it stores 1 in db, otherwise nothing. So I have sequences of "ones".
As here. Distance between points is 1s
I want to get only the time of first point of each "one" sequence. On this particular example it would be
Time Value
2019-01-01 11:46:55 1
2019-01-01 12:36:45 1
In red squares
Is there a way to do it using queries? Or may be easy python pattern?
P.S. first() selector requires GROUP BY, but I cannot assume that sequences are less then some_time_interval

You could probably achieve what you need with a nested query and the difference function. https://docs.influxdata.com/influxdb/v1.7/query_language/functions/#difference
For example:
Let's say your measurement is called change and the field is called value
SELECT diff FROM
(SELECT difference(value) as diff FROM
(SELECT value FROM change WHERE your_time_filter GROUP BY time(1s) fill(0))
)
WHERE diff > 0
So you first fill in the empty spaces with zeros. Then take the differences between subsequent values. Then pick the differences that are positive. If the difference is 0 then it is between two zeros or two ones, no change. If it is 1 then the change is from zero to one (what you need). If it is -1 then the change is from one to zero, end of sequence of ones.
You may need to tweak the query a bit with time intervals and groupings.

Related

(SPSS) Assign values to remaining time point based on value on another variable, and looping for each case

I am currently working on analyzing a within-subject dataset with 8 time-ordered assessment points for each subject.
The variables of interest in this example is ID, time point, and accident.
I want to create two variables: accident_intercept and accident_slope, based on the value on accident at a particular time point.
For the accident_intercept variable, once a participant indicated the occurrence of an accident (e.g., accident = 1) at a specific time point, I want the values for that time point and the remaining time points to be 1.
For the accident_slope variable, once a participant indicated the occurrence of an accident (e.g., accident = 1) at a specific time point, I want the value of that time point to be 0, but count up by 1 for the remaining time points until the end time point, for each subject.
The main challenge here is that the process stated above need to be repeated/looped for each participant that occupies 8 rows of data.
Please see how the newly created variables would look like:
I have looked into the instruction for different SPSS syntax, such as loop, the lag/lead functions. I also tried to break my task into different components and google each one. However, I have not made any progress :)
I would be really grateful of any helps and directions that you provide.
Here is one way to do what you need using aggregate to calculate "accident time":
if accident=1 accidentTime=TimePoint.
aggregate out=* mode=addvariables overwrite=yes /break=ID/accidentTime=max(accidentTime).
if TimePoint>=accidentTime Accident_Intercept=1.
if TimePoint>=accidentTime Accident_Slope=TimePoint-accidentTime.
recode Accident_Slope accidentTime (miss=0).
Here is another approach using the lag function:
compute Accident_Intercept=0.
if accident=1 Accident_Intercept=1.
if $casenum>1 and id=lag(id) and lag(Accident_Intercept)=1 Accident_Intercept=1.
compute Accident_Slope=0.
if $casenum>1 and id=lag(id) and lag(Accident_Intercept)=1 Accident_Slope=lag(Accident_Slope) +1.
exe.

What is an efficient way to cross-compare two different data pairs in Excel to spot differences?

Summary
I am looking to compare two data sets within Excel, and produce an output depending on which has changed, and what to.
More info
I hold two databases, which are updated independently. I cross compare these databases monthly, to see which database(s) have changed, and who holds the most accurate data. The other database is then amended to reflect the correct value. I am trying to automate the process of deciding which database needs to be updated. I'm comparing not just data change, but data change over time.
Example
On month 1, database 1 contains the value "Foo". Database 2 also contains the value "Foo". On month 2, database 1 now contains the value "Bar", but database 2 still contains the value "Foo". I can ascertain that because database 1 holds a different value, but last month they held the same value, database 1 has been updated, and database 2 should be updated to reflect this.
Table Example
Data1 Month1
Data2 Month1
Data1 Month2
Data2 Month2
Database to update
Reason
Foo
Foo
Foo
Foo
None
All match
Apple
Apple
Orange
Apple
Data2
Data1 has new data when they did match previously. Data2 needs to be updated with the new info.
Cat
Dog
Dog
Dog
None
They mismatched previously, but both databases now match.
1
1
1
2
Data1
Data2 has new data when they did match previously. Data1 needs to be updated with the new info.
AAA
BBB
AAA
BBB
CHECK
Both databases should match, but you cannot ascertain which should be updated.
ABC
ABC
DEF
GHI
CHECK
Both databases changed, but you cannot tell if Data1 or Data2 is correct as they were updated at the same time.
Current logic
Currently, I'm trying to get this to work using multiple nested =IF statements, combined with some =AND and =NOT statements. Essentially, an example part of the statement would be (database 1, month 1 = DB1M1, etc.): =IF(AND(DB1M1=DB2M1,DB2M1=DB2M2),"None",IF(AND(DB1M1=DB2M1,DB1M1=DB2M2,NOT(DB2M1=DB1M2)),"Data2",IF(ETC,ETC,ETC).
I've had some success with this, but due to the length of the statement, it is very messy and I'm struggling to make it work, as it becomes unreadable for me trying to calculate the possible outcomes in just =IF clauses. I also have no doubt it's incredibly inefficient, and I'd like to make it more efficient, especially considering the size of the database is around 10,000 lines.
Final Notes / Info
I'd appreciate any help with getting this to work. I'm keen to learn, so any tips and advice are always welcomed.
I'm using MSO 365, version 2202 (I cannot update beyond this). This will be run in the Desktop version of Excel. I would prefer this is done exclusively using formulas, but I am open to using Visual Basic if it would be otherwise impossible or incredibly inefficient. Thanks!
In previous similar scenarios, it sounds me familiar using bitwise operations or binary numbers. The main idea behind a binary number is that each digit can act as flag indicating if certain property is present or not.
The goal is to identify if two databases (DB1, DB2) are on sync based on a given value over two periods (M1, M2). If one database is out of sync we would like to know which action to carry out to have it on sync with respect to the other database. Similarly we would like to know when both databases are out of sync at the end of the period.
Here is the Excel solution in cell M2, then extend down the formula:
=LET(dec,
BIN2DEC(IF(B2=C2,0,1)&IF(D2=E2,0,1)&IF(B2=D2,0,1)&IF(C2=E2,0,1)),
DBsOnSync, ISNUMBER(FIND(dec, "0;10;3;9;11")),
DBsOutOfSync, ISNUMBER(FIND(dec, "7;12;13;14;15")),
IFERROR(IFS(dec=5,"Update DB1", dec=6,"Update DB2", DBsOnSync=TRUE,
"DBs on Sync", DBsOutOfSync=TRUE, "DBs out of Sync"), "Case not defined")
)
The input table tries to consider all possible combinations, so we can build the logic. The highlighted columns are not really necessary, it is just for illustrative or testing purpose. In red combinations already defined previously so it is no really necessary to take into account.
Explanation
We build a binary number based on the following conditions for each binary digit. This is just an intermediate result to convert it to a decimal number via BIN2DEC and determine the case for each possible value.
BIN2DEC(IF(B2=C2,0,1)&IF(D2=E2,0,1)&IF(B2=D2,0,1)&IF(C2=E2,0,1))
We have four conditions, so we build a binary number of length 4, where each digit represent a flag condition (0-equal, 1-not equal).
We build the binary number that will be the input for BIN2DEC via concatenation of the logical conditions we are looking for. Each IF condition represents a binary digit from left to right:
IF(B2=C2,0,1) check for DB1, DB2 are consistent in M1 (intermediate calculation shown in column M1).
IF(D2=E2,0,1) check for DB1, DB2 are consistent in M2 (intermediate calculation shown in column M2).
IF(B2=D2,0,1) DB1 keeps consistency over time (intermediate calculation shown in column DB1).
IF(C2=E2,0,1) DB2 keeps consistency over time (intermediate calculation shown in column DB2).
Converting the binary number to decimal, we can identify each case, assigning a set of decimal numbers. The following decimal number or set of decimal numbers represent each case:
dec
Scenario
0,10,3,9,11
DBs on Sync
5
Update DB1
6
Update DB2
7,12,13,14,15
DBs out of sync
We use IFS and FIND to identify each case based on dec value. We use FIND to find dec in the string that represents the set of possible numbers for each case. We use ISNUMBER to check whether the number was found or not. We include as last resource, for testing purpose, if some case was not defined yet, it returns Case not defined.
Notes
Columns F:I give a hint about maximum number of possible combinations. We have four columns, with only two possible values: Sync, NotSync. Which represents 2*2*2*2=16 combinations, which represents the maximum possible binary numbers of size 4 we can have (we have four conditions).
As you can see from the screenshot we have less numbers of unique combinations (12). The reason is because the way we build the binary numbers they have dependencies, so some combination are impossible.

How do I subtract values in the SAME variable column in SPSS?

So I'm trying to create a new variable column of ''first differences'' by subtracting values in the SAME column but have no clue how to do so on SPSS. For example, in this picture:
1st value - 0 = 0 (obviously). 2nd value - 1st value =..., 3rd value - 2nd value =..., 4th value - 3rd value =... and so on.
Also, if there is a negative number, does SPSS allow me to log it/regress it? Once I find the first difference, I'm going to LOG it & then regress it. For context, the reason I'm doing this is part of a bigger equation to find out how economic growth and a CHANGE in economic growth (hence the first difference and log) will affect the variable im studying.
Thanks.
To calculate differences between values in consecutive rows use this:
if $casenum>1 diffs = FinalConsumExp - lag(FinalConsumExp).
execute.
If you need help with additional problems please start a separate question for each problem.
HTH.

Need an optimized way to handle combination of entities to improve performance

So, I am working on a feature in a web application. The problem is like this-
I have four different entities. Let's say those are - Item1, Item2, Item3, Item4. There's two phase of the feature. Let's say the first phase is - Choose entities. In the first phase, User will have option to choose multiple items for each entity and for every combination from that choosing, I need to do some calculation. Then in the second phase(let's say Relocate phase) - based on the calculation done in the first phase, for each combination I would have to let user choose another combination where the value of the first combination would get removed to the row of the second combination.
Here's the data model for further clarification -
EntityCombinationTable
(
Id
Item1_Id,
Item2_Id,
Item3_Id,
Item4_Id
)
ValuesTable
(
Combination_Id,
Value
)
So suppose I have following values in both values -
EntityCombinationTable
Id -- Item1_Id -- Item2_Id -- Item3_Id -- Item4_Id
1 1 1 1 1
2 1 2 1 1
3 2 1 1 1
4 2 2 1 1
ValuesTable
Combination_Id -- Value
1 10
2 0
3 0
4 20
So if in the first phase - I choose (1,2) for Item1, (1,2) for Item_2 and 1 for both Item_3 and Item4, then total combination would be 2*2*1*1 = 4.
Then in the second phase, for each of the combination that has value greater than zero, I would have to let the user choose different combination where the values would get relocated.
For example - As only combination with Id 1 and 2 has value greater than zero, only two relocation combination would need to be shown in the second dialog. So if the user choose (3,3,3,3) and (4,4,4,4) as relocation combination in the second phase, then new row will need to be inserted in
EntityCombinationTable for (3,3,3,3) and (4,4,4,4). And values of (1,1,1,1) and (2,2,1,1) will be relocated respectively to rows corresponding to (3,3,3,3) and (4,4,4,4) in the ValuesTable.
So the problem is - each of the entity can have items upto 100 or even more. So in worst case the total number of combinations can be 10^8 which would lead to a very heavy load in database(inserting and updating a huge number rows in the table) and also generating all the combination in the code level would require a substantial time.
I have thought about an alternative approach to not keep the items as combination. Rather keep separate table for each entity. and then make the combination in the runtime. Which also would cause performance issue. As there's a lot more different stages where I might need the combination. So every time I would need to generate all the combinations.
I have also thought about creating key-value pair type table, where I would keep the combination as a string. But in this approach I am not actually reducing number of rows to be inserted rather number of columns is getting reduced.
So my question is - Is there any better approach this kind of situation where I can keep track of combination and manipulate in an optimized way?
Note - I am not sure if this would help or not, but a lot of the rows in the values table will probably have zero as value. So in the second phase we would need to show a lot less rows than the actual number of possible combinations

Getting random entry from Objectify entity

How can I get a random element out of a Google App Engine datastore using Objectify? Should I fetch all of an entity's keys and choose randomly from them or is there a better way?
Assign a random number between 0 and 1 to each entity when you store it. To fetch a random record, generate another random number between 0 and 1, and query for the smallest entity with a random value greater than that.
You don't need to fetch all.
For example:
countall = query(X.class).count()
// http://groups.google.com/group/objectify-appengine/browse_frm/thread/3678cf34bb15d34d/82298e615691d6c5?lnk=gst&q=count#82298e615691d6c5
rnd = Generate random number [0..countall]
ofy.query(X.class).order("- date").limit(rnd); //for example -date or some chronic indexed field
Last id is your...
(in average you fatch 50% or at lest first read is in average 50% less)
Improvements (to have smaller key table in cache)!
After first read remember every X elements.
Cache id-s and their position. So next time query condition from selected id further (max ".limit(rnd%X)" will be X-1).
Random is just random, if it doesn't need to be close to 100% fair, speculate chronic field value (for example if you have 1000 records in 10 days, for random 501 select second element greater than fifth day).
Other options, if you have chronic field date (or similar), fetch elements older than random date and younger then random date + 1 (you need to know first date and last date). Second select random between fetched records. If query is empty select greater than etc...
Quoted from this post about selecting some random elements from an Objectified datastore:
If your ids are sequential, one way would be to randomly select 5
numbers from the id range known to be in use. Then use a query with an
"in" filter().
If you don't mind the 5 entries being adjacent, you can use count(),
limit(), and offset() to randomly find a block of 5 entries.
Otherwise, you'll probably need to use limit() and offset() to
randomly select one entry out at a time.
-- Josh
I pretty much adapt the algorithm provided Matejc. However, 3 things:
Instead of using count() or the datastore service factory (DatastoreServiceFactory.getDatastoreService()), I have an entity that keep track of the total count of the entities that I am interested in. The reason for this approach is that:
a. count() could be expensive when you are dealing with a lot of objects
b. You can't test the datastore service factory locally...testing in prod is just a bad practice.
Generating the random number: ThreadLocalRandom.current().nextLong(1, maxRange)
Instead of using limit(), I use offset, so I don't have to worry about "sorting."

Resources