I am currently working on analyzing a within-subject dataset with 8 time-ordered assessment points for each subject.
The variables of interest in this example is ID, time point, and accident.
I want to create two variables: accident_intercept and accident_slope, based on the value on accident at a particular time point.
For the accident_intercept variable, once a participant indicated the occurrence of an accident (e.g., accident = 1) at a specific time point, I want the values for that time point and the remaining time points to be 1.
For the accident_slope variable, once a participant indicated the occurrence of an accident (e.g., accident = 1) at a specific time point, I want the value of that time point to be 0, but count up by 1 for the remaining time points until the end time point, for each subject.
The main challenge here is that the process stated above need to be repeated/looped for each participant that occupies 8 rows of data.
Please see how the newly created variables would look like:
I have looked into the instruction for different SPSS syntax, such as loop, the lag/lead functions. I also tried to break my task into different components and google each one. However, I have not made any progress :)
I would be really grateful of any helps and directions that you provide.
Here is one way to do what you need using aggregate to calculate "accident time":
if accident=1 accidentTime=TimePoint.
aggregate out=* mode=addvariables overwrite=yes /break=ID/accidentTime=max(accidentTime).
if TimePoint>=accidentTime Accident_Intercept=1.
if TimePoint>=accidentTime Accident_Slope=TimePoint-accidentTime.
recode Accident_Slope accidentTime (miss=0).
Here is another approach using the lag function:
compute Accident_Intercept=0.
if accident=1 Accident_Intercept=1.
if $casenum>1 and id=lag(id) and lag(Accident_Intercept)=1 Accident_Intercept=1.
compute Accident_Slope=0.
if $casenum>1 and id=lag(id) and lag(Accident_Intercept)=1 Accident_Slope=lag(Accident_Slope) +1.
exe.
Inlet -> front -> middle -> rear -> outlet
Those five properties have a value anything between 4 - 40. Now i want to calculate a specific match for each of those values that is either a full 10 or a 5 when a single property is summed from each pipe piece. There might be hundreds of different pipe pieces all with different properties.
So if i have all 5 pieces and when summed, their properties go like 54,51,23,71,37. That is not good and not what im looking.
Instead 55,50,25,70,40. That would be perfect.
My trouble is there are so many of the pieces that it would be insane to do the miss'matching manually, and new ones come up frequently.
I have manually inserted about 100 of these already into SQLite, but should be easy to convert into any excel or other database formats, so answer can be related to anything like mysql or googlesheets.
I need the calculation that takes every piece in account and results either in "no match" or tells me the id of each piece that is required for a match and if multiple matches are available, it separates them.
Edit: Even just the math needed to do this kind of calculation would be a lot of help here, not much of a math guy myself. I guess there should be a reference piece i need to use and then that gets checked against every possible scenario.
If the value you want to verify is in A1, use: =ROUND(A1/5,0)*5
If the pipes may not be shorter than the given values, use =CEILING(A1,5)
I'm not sure if like function can be used to compare strings or if there is another function to achieve this but this is my case, I have the below part:
R71-14-40000-ATN-LH-D-PF, for the third segment (40000) which is the length; the first 3 digits are the integer part and last 2 digits are the decimals.
I would like to get all parts from DB where the length (third segment) is equals or greater than that value for example, if I use the above part I should get the yellow values an omit the other ones (the values can also be R71-14-50000-ATN-LH-D-PF, R71-14-55000-ATN-LH-D-PF, R71-14-60000-ATN-LH-D-PF, not only start with 4 etc).
I tried this PartNum like '%R71-14-%-ATN-LH-D-PF%' but I get all parts no matter its third segment value
You can use a substring, I think:
where substring(col, 8, 5) >= substring('R71-14-40000-ATN-LH-D-PF', 8, 5)
Some databases use substr() rather than substring().
Using a more restriction LIKE value such as
PartNum LIKE 'R71-14-4____-ATN-LH-D-PF'
would answer the particular query for "values with the 3rd-segment starting with a 4". It could also be ..14-4%-ATN.., although I chose the _ match-exactly-one wildcard for explicitness of a fixed 3rd-segment length (5); it's also easier for the engine to match against.
Then expanding to for "equals or greater than 4" under this fixed-width data can be done by choosing the 3rd-segment starting with a 4, or 5, or 6..
PartNum LIKE 'R71-14-[456789]____-ATN-LH-D-PF'
This works in SQL Server, although there might be slight variations in different RDMBS implementations. This approach is lexical based, which works fine on single-character integer values even though it does not use/utilize numeric equality. SQL Server also supports character-negations that can be useful - see the documentation for the specific RDBMS.
The leading and trailing % are not needed per the shown data. Using a leading % can also be very detrimental to index usage.
The trailing % makes more sense if not caring about the remaining segments,
PartNum LIKE 'R71-14-[456789]____-%'
And if needing to only care about the 3rd-segment,
PartNum LIKE '___-__-[456789]____-%'
PartNum LIKE '___-__-[456789]%' -- or even this
Note the difference from the original query (..14-%-ATN..), which matches all values as expected. This is because it does not add any restrictions to the 3rd-segment value.
I have a database full of facts such as:
overground(newcrossgate,brockley,2).
overground(brockley,honoroakpark,3).
overground(honoroakpark,foresthill,3).
overground(foresthill,sydenham,2).
overground(sydenham,pengewest,3).
overground(pengewest,anerley,2).
overground(anerley,norwoodjunction,3).
overground(norwoodjunction,westcroydon,8).
overground(sydenham,crystalpalace,5).
overground(highburyandislington,canonbury,2).
overground(canonbury,dalstonjunction,3).
overground(dalstonjunction,haggerston,1).
overground(haggerston,hoxton,2).
overground(hoxton,shoreditchhighstreet,3).
example: newcrossgate to brockley takes 2 minutes.
I then created a rule so that if I enter the query istime(newcrossgate,honoroakpark,Z). then prolog should give me the time it takes to travel between those two stations. (The rule I made is designed to calculate the distance between any two stations at all, not just adjacent ones).
istime(X,Y,Z):- istime(X,Y,0,Z); istime(Y,X,0,Z).
istime(X,Y,T,Z):- overground(X,Y,Z), T1 is T + Z.
istime(X,Y,Z):- overground(X,A,T), istime(A,Y,T1), Z is T + T1.
istime(X,Y,Z):- overground(X,B,T), istime(B,X,T1), Z is T + T1.
it seems to work perfectly for newcrossgate to the first couple stations, e.g newcrossgate to foresthill or sydenham. However, after testing newcrossgate to westcroydon which takes 26mins, I tried newcrossgate to crystalpalace and prolog said it should take 15 mins...despite the fact its the next station after westcroydon. Clearly somethings wrong here, however it works for most of the stations while coming up with a occasional error in time every now and again, can anyone tell me whats wrong? :S
This is essentially the same problem as your previous question, the only difference is that you need to accumulate the time as you go.
One thing I see is that your "public" predicate, istime/3 tries to do too much. All it should do is seed the accumulator and invoke the worker predicate istime/4. Since you're looking for route/time in both directions, the public predicate should be just
istime( X , Y , Z ) :- istime( X , Y , 0 , Z ) .
istime( X , Y , Z ) :- istime( Y , X , 0 , Z ) .
The above is essentially the first clause of your istime/3 predicate
istime(X,Y,Z):- istime(X,Y,0,Z); istime(Y,X,0,Z).
The remaining clauses of istime/3, the recursive ones:
istime(X,Y,Z):- overground(X,A,T), istime(A,Y,T1), Z is T + T1.
istime(X,Y,Z):- overground(X,B,T), istime(B,X,T1), Z is T + T1.
should properly be part of istime/4 and have the accumulator present. That's where your problem is.
Give it another shot and edit your question to show the next iteration. If you still can't figure it out, I'll show you some different ways to do it.
Some Hints
Your "worker" predicate will likely look a lot like your earlier "find a route between two stations" exercise, but it will have an extra argument, the accumulator for elapsed time.
There are two special cases. If you use the approach you used in your "find a route between two stations" solution, the special cases are
A and B are directly adjacent.
A and B are connected via at least one intermediate stop.
There's another approach as well, that might be described as using lookahead, in which case the special cases are
A and B are the same, in which case you're arrived.
A and B are not and are connected by zero or more intermediate stops.
FWIW, You shouldn't necessarily expect the route with the shortest elapsed time or the minimal number of hops to be the first solution found. Backtracking will produce all the routes, but the order in which they are found has to do with the order in which the facts are stored in the database. A minimal cost search of the graph is another kettle of fish entirely.
Have you tried to cycle through answers with ;? 26mins is not the shortest time between newcrossgate and westcroydon...
Edit: my bad! Apparently the shorter results were due to a bug in your code (see my comment about the 4th clause). However, your code is correct, 15mins is the shortest route between newcrossgate and crystalpalace. Only because there is a route that goes from newcrossgate to westcroydon, then crystalpalace, that doesn't mean it's the shortest route, or the route your program will yield first.
Update: if you're running into problems to find answers to some routes, I'd suggest changing the 3rd clause to:
istime(X,Y,_,Z):- overground(X,A,T), istime(A,Y,T1), Z is T + T1.
The reason is simple: your first clause swaps X with Y, which is good, since with that you're saying the routes are symmetrical. However, the 3rd clause does not benefit from that, because it's never called by the swapped one. Ignoring the 3rd argument (which you're not using anyway) and thus letting the 1st clause call the 3rd might fix your issue, since some valid routes that were not used previously will be now.
(also: I agree with Nicholas Carey's answer, it would be better to use the third argument as an accumulator; but as I said, ignoring it for now might just work)
To make it work you need to do the reverse of both journeys stated in your last clause.
Keep the predicate as it is, istime(X,Y,Z) and just make another clause containing the reverse journeys.
This way it works with all the stations. (Tried and Tested)
So it's a standard in basically every address form out there and I'm questioning why?
Address Line 2. It's in every form that asks for address details. It's never actually seemed necessary to me. It requires another field in the database and all the goofy maintenance that goes with it. Every time you use an address, you have to concatenate it and 99% of the time line 2 is empty. The other 1% of the time you could've just put it into line 1.
Instead of calling it line 2, couldn't it be called something with clearer semantics... like "apartment number"?
It ruins the semantics of the whole address concept. You don't really know what you have in either field. Except maybe that the concatenation of the two fields results in a "plain old address". But "Line 1" and "Line 2" themselves don't really have any meaning. Is something "supposed" to go in each respectively? I've never seen it. Why don't we have address line 3 while we're at it?
I've been thinking about it and realized that as a result, I don't really trust the address data in my database. The whole field is flaky in general because you can't really do validation on it (some addresses have roads and a house number, others have streets and avenues). Except these days you could do something like validate the field against a geolocation api. But simply because of the "Line 2" thing, you can't really be certain what you're doing. Should I combine the (line 1 + line 2), then validate? What do I do with the users original input if I'm correcting them ("did you mean xxx")? Do I just say, "yah, address line 2 doesn't really do anything... I just took your validated input and dumped it into line 1." Why am I even giving the end user (and myself) the chance to be confused.
The way I see it, the field should either be an address (street + house number), or if we're going to split things up, do it properly and ask for the road and house number independently.
Allowing loose data input is never a good idea. If you must support a multi-line address, use 2 text boxes called address1 and address2. Do not use a non-structured input format (textarea) to collect structured information (an address).
Actually, in rare cases a user might even want to have a third address line. The best solution to this is to use a <textarea> that will accept newlines for a more complex address and store the address exactly as entered in the database.
Address line one is sometimes used by companies for an attention name, which makes address line 2 necessary for the address itself. Imagine something like:
Name: Microsoft
Address 1: Att.: Bill Gates
Address 2: One Microsoft Way
...
It isn't always an apartment number. It could be a floor (single house, multiple residences), or other things.