How do I normalize temporal data to their initial values? - c

I have data that are acquired every 3 seconds. Initially, they always begin within a narrow baseline range (i.e. 100±10) but after ~30 seconds they begin to increase in value.
Here's an example.
The issue is that for every experiment, the initial baseline value may start at a different point in the y-axis (i.e. 100, 250, 35) due to variations in equipment calibration.
Although the relative signal enhancement at ~30 seconds behaves the same across different experiments, there may be an offset along the y-axis.
My intention is to measure the AUC of these curves. Because of the offset between experiments, they are not comparable, although they could potentially be identical in shape and enhancement ratio.
Therefore I need to normalize the data so that regardless of offset they all have comparable baseline initial values. This could be set to 0.
Can you give me any suggestions on how to accomplish the normalization on Matlab?
Ideally the output data should be of relative signal enhancement (in percent relative to baseline).
For example, the baseline values above would hover around 0±10 (instead of the raw original value of ~139) and with enhancement they would build up to ~65% (instead of the original raw value of ~230).
Sample data:
index SQMean
_____ ____________
'0' '139.428574'
'1' '133.298706'
'2' '135.961044'
'3' '143.688309'
'4' '133.298706'
'5' '133.181824'
'6' '134.896103'
'7' '146.415588'
'8' '142.324677'
'9' '128.168839'
'10' '146.116882'
'11' '146.766235'
'12' '134.675323'
'13' '138.610382'
'14' '140.558441'
'15' '128.662338'
'16' '138.480515'
'17' '153.610382'
'18' '156.207794'
'19' '183.428574'
'20' '220.324677'
'21' '224.324677'
'22' '230.415588'
'23' '226.766235'
'24' '223.935059'
'25' '229.922073'
'26' '234.389618'
'27' '235.493500'
'28' '225.727280'
'29' '241.623383'
'30' '225.805191'
'31' '240.896103'
'32' '224.090912'
'33' '230.467529'
'34' '248.285721'
'35' '233.779221'
'36' '225.532471'
'37' '247.337662'
'38' '233.000000'
'39' '241.740265'
'40' '235.688309'
'41' '238.662338'
'42' '236.636368'
'43' '236.025970'
'44' '234.818176'
'45' '240.974030'
'46' '251.350647'
'47' '241.857147'
'48' '242.623383'
'49' '245.714279'
'50' '250.701294'
'51' '229.415588'
'52' '236.909088'
'53' '243.779221'
'54' '244.532471'
'55' '241.493500'
'56' '245.480515'
'57' '244.324677'
'58' '244.025970'
'59' '231.987015'
'60' '238.740265'
'61' '239.532471'
'62' '232.363632'
'63' '242.454544'
'64' '243.831161'
'65' '229.688309'
'66' '239.493500'
'67' '247.324677'
'68' '245.324677'
'69' '244.662338'
'70' '238.610382'
'71' '243.324677'
'72' '234.584412'
'73' '235.181824'
'74' '228.974030'
'75' '228.246750'
'76' '230.519485'
'77' '231.441559'
'78' '236.324677'
'79' '229.935059'
'80' '238.701294'
'81' '236.441559'
'82' '244.350647'
'83' '233.714279'
'84' '243.753250'

Close to what was mentioned by Shai:
blwindow = 1:nrSamp;
DataNorm = 100*(Data/mean(Data(blwindow))-1)
Set the window to the right size, however you want to determine it, it depends on your data. Output DataNorm is in %.

Usually this kind of problems requires some more specific knowledge about the data you are measuring (range, noise level, if you know when the actual data starts etc.) and the results you are trying to achieve. However, based on your question only and by looking at your example graph, I'd do something like this (assuming your data is in two arrays, time and data):
initialTimeMax = 25; % take first 25 s
baseSample = data(time <= initialTimeMax); % take part of the data corresponding to the first 25 s
baseSampleAverage = mean(baseSample); % take average to deal with noise
data = data - baseSampleAverage;
If you don't know when your data starts, you can apply a smoothing filter, then take a derivative, find the x-position of its maximum, and set initialTimeMax to this x-position.

Related

P-values for glmer mixed effects logistic regression in Python

I have a dataset for one year for all employees with individual-level data (e.g. age, gender, promotions, etc.). Each employee is in a team of a certain manager. I have some variables on the team- and manager-levels as well (e.g. manager's tenure, team diversity, etc.). I want to explain the termination of employees (binary: left the company or not). I am running a multilevel logistic regression, where employees are grouped by their managers, therefore they share the same team- and manager-level characteristics.
So, my model looks like:
Termination ~ Age + Time in company + Promotions + Manager tenure + Percent of employees who completed training", data, groups=data[Manager_ID]
Dataset example:
data = {'Employee': ['ID1', 'ID2','ID3','ID4','ID5','ID6','ID7', 'ID8'],
'Manager_ID': ['MID1', 'MID2','MID2','MID1','MID3','MID3','MID3', 'MID1'],
'Termination': ['0', '0', '0', '0', '1', '1', '1', '0'],
'Age': ['35', '40','50','24','33','46','44', '31'],
'TimeinCompany': ['1', '3', '10', '20', '4', '0', '4', '9'],
'Promotions': ['1', '0', '0', '0', '1', '1', '1', '0'],
'Manager_Tenure': ['10', '5', '5', '10', '8', '8', '8', '10'],
'PercentCompletedTrainingTeam': ['40', '20', '20', '40', '49', '49', '49', '40']}
columns = ['Employee','Manager_ID','Age', 'TimeinCompany', 'Promotions', 'Manager_Tenure', 'AverageAgeTeam', 'PercentCompletedTrainingTeam']
data = pd.DataFrame(data, columns=columns)
I managed to run mixed effects logistic regression using lme4 package from R in Python.
importr('lme4')
model1 = r.glmer(formula=Formula('Termination ~ Age + TimeinCompany + Promotions + Manager_Tenure + PercentCompletedTrainingTeam + (1 | Manager_ID)'),
data=data)
print(r.summary(model1))
I receive the following output for the full sample:
REML criterion at convergence: 54867.6
Scaled residuals:
Min 1Q Median 3Q Max
-2.9075 -0.3502 -0.2172 -0.0929 3.9378
Random effects:
Groups Name Variance Std.Dev.
Manager_ID (Intercept) 0.005033 0.07094
Residual 0.072541 0.26933
Number of obs: 211974, groups: Manager_ID, 24316
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.14635573 0.00299341 48.893
Age -0.00112153 0.00008079 -13.882
TimeinCompany -0.00238352 0.00010314 -23.110
Promotions -0.01754085 0.00491545 -3.569
Manager_Tenure -0.00044373 0.00010834 -4.096
PercentCompletedTrainingTeam -0.00014393 0.00002598 -5.540
Correlation of Fixed Effects:
(Intr) Age TmnCmpny Promotions Mngr_Tenure
Age -0.817
TmnCmpny 0.370 -0.616
Promotions -0.011 -0.009 -0.033
Mngr_Tenure -0.279 0.013 -0.076 0.035
PrcntCmpltT -0.309 -0.077 -0.021 -0.042 0.052
But, there are no p-values displayed. I read a lot that lme4 does not provide p-values for a number of reasons, however I have to have them for the work presentation.
I tried several possible solutions that I found, but none of them worked:
importr('lmerTest')
importr('afex')
print(r.anova(model1))
does not display any output
print(r.anova(model1, ddf="Kenward-Roger"))
only displays npar, Sum Sq, Mean Sq, F value
print(r.summary(model1, ddf="merModLmerTest"))
Provides the same output as with just summary
print(r.anova(model1, "merModLmerTest"))
only displays npar, Sum Sq, Mean Sq, F value
Any ideas on how to get p-values are much appreciated.

When creating a tensor with an array of timestamps, the numbers are incorrect

Looking for some kind of solution to this issue:
trying to create a tensor from an array of timestamps
[
1612892067115,
],
but here is what happens
tf.tensor([1612892067115]).arraySync()
> [ 1612892078080 ]
as you can see, the result is incorrect.
Somebody pointed out, I may need to use the datatype int64, but this doesn't seem to exist in tfjs 😭
I have also tried to divide my timestamp to a small float, but I get a similar result
tf.tensor([1.612892067115, 1.612892068341]).arraySync()
[ 1.6128920316696167, 1.6128920316696167 ]
If you know a way to work around using timestamps in a tensor, please help :)
:edit:
As an attempted workaround, I tried to remove my year, month, and date from my timestamp
Here are my subsequent input values:
[
56969701,
56969685,
56969669,
56969646,
56969607,
56969602
]
and their outputs:
[
56969700,
56969684,
56969668,
56969648,
56969608,
56969600
]
as you can see, they are still incorrect, and should be well within the acceptable range
found a solution that worked for me:
Since I only require a subset of the timestamp (just the date / hour / minute / second / ms) for my purposes, I simply truncate out the year / month:
export const subts = (ts: number) => {
// a sub timestamp which can be used over the period of a month
const yearMonth = +new Date(new Date().getFullYear(), new Date().getMonth())
return ts - yearMonth
}
then I can use this with:
subTimestamps = timestamps.map(ts => subts(ts))
const x_vals = tf.tensor(subTimestamps, [subTimestamps.length], 'int32')
now all my results work as expected.
Currently only int32 is supported with tensorflow.js, your data has gone out of the range supported by int32.
Until int64 is supported, this can be solved by using a relative timestamp. Currently a timestamp in js uses the number of ms that elapsed since 1 January 1970. A relative timestamp can be used by using another origin and compute the difference of ms that has elapsed since that date. That way, we will have a lower number that can be represented using int32. The best origin to take will be the starting date of the records
const a = Date.now() // computing a tensor out of it will give an accurate result since the number is out of range
const origin = new Date("02/01/2021").now()
const relative = a - origin
const tensor = tf.tensor(relative, undefined, 'int32')
// get back the data
const data = tensor.dataSync()[0]
// get the initial date
const initial date = new Date(data + origin)
In other scenarios, if using the ms is not of interest, using the number of s that has elapsed since the start would be better. It is called the unix time

Exception: decimal.InvalidOperation raised when saving a Django data model

I am storing crypto-currency data into a Django data model (using Postgres database). The vast majority of the records are saved successfully. But, on one record in particular I am getting an exception decimal.InvalidOperation.
The weird thing is, I can't see anything different about the values being saved in the problematic record from any of the others that save successfully. I have included a full stack trace on paste bin. Before the data is saved, I have outputted raw values to the debug log. The following is the data model I'm saving the data to. And the code that saves the data to the data model.
I'm stumped! Anyone know what the problem is?
Data Model
class OHLCV(m.Model):
""" Candles-stick data (open, high, low, close, volume) """
# class variables
_field_names = None
timeframes = ['1m', '1h', '1d']
# database fields
timestamp = m.DateTimeField(default=timezone.now)
market = m.ForeignKey('bc.Market', on_delete=m.SET_NULL, null=True, related_query_name='ohlcv_markets', related_name='ohlcv_market')
timeframe = m.DurationField() # 1 minute, 5 minute, 1 hour, 1 day, or the like
open = m.DecimalField(max_digits=20, decimal_places=10)
high = m.DecimalField(max_digits=20, decimal_places=10)
low = m.DecimalField(max_digits=20, decimal_places=10)
close = m.DecimalField(max_digits=20, decimal_places=10)
volume = m.DecimalField(max_digits=20, decimal_places=10)
Code Which Saves the Data Model
#classmethod
def fetch_ohlcv(cls, market:Market, timeframe:str, since=None, limit=None):
"""
Fetch OHLCV data and store it in the database
:param market:
:type market: bc.models.Market
:param timeframe: '1m', '5m', '1h', '1d', or the like
:type timeframe: str
:param since:
:type since: datetime
:param limit:
:type limit: int
"""
global log
if since:
since = since.timestamp()*1000
exchange = cls.get_exchange()
data = exchange.fetch_ohlcv(market.symbol, timeframe, since, limit)
timeframe = cls.parse_timeframe_string(timeframe)
for d in data:
try:
timestamp = datetime.fromtimestamp(d[0] / 1000, tz=timezone.utc)
log.debug(f'timestamp={timestamp}, market={market}, timeframe={timeframe}, open={d[1]}, high={d[2]}, low={d[3]}, close={d[4]}, volume={d[5]}')
cls.objects.create(
timestamp=timestamp,
market=market,
timeframe=timeframe,
open=d[1],
high=d[2],
low=d[3],
close=d[4],
volume=d[5],
)
except IntegrityError:
pass
except decimal.InvalidOperation as e:
error_log_stack(e)
Have a look at your data and check if it fits within the field limitations:
The mantissa must fit in the max_digits;
The decimal places should be less than decimal_places;
And according to the DecimalValidator : the number of whole digits should not be greater than max_digits - decimal_places;
Not sure how your fetch_ohlcv function fills the data array, but if there is division it is possible that the number of decimal_digits is greater than 10.
The problem I had, that brought me here, was too many digits in the integer part therefore failing the last requirement.
Check this answer for more information on a similar issue.

Headings for 13 months on Array

I was wondering if someone can assist me, why the headers of the below code is not creating them correctly for 13 months. I would like to add Jan(LY) Feb(LY)......until Jan(TY).
Please see below Heading code only,
//Headings
J:=0;
for I:=12 downto 0 do
begin
J:=J+1;
Amonth:=IntToStr(J);
If basemonth-12 <=0 then Harray[J]:=Basemonth-I+13;
If basemonth-12 > 0 then Harray[J]:=Basemonth-I;
If Harray[J] >13 then Harray[J]:=Harray[J]-13;
//showmessage(INTTOSTR(Harray[J]));
end;
Heading1:=Monthcalc(Harray[1]);
Heading2:=Monthcalc(Harray[2]);
Heading3:=Monthcalc(Harray[3]);
Heading4:=Monthcalc(Harray[4]);
Heading5:=Monthcalc(Harray[5]);
Heading6:=Monthcalc(Harray[6]);
Heading7:=Monthcalc(Harray[7]);
Heading8:=Monthcalc(Harray[8]);
Heading9:=Monthcalc(Harray[9]);
Heading10:=Monthcalc(Harray[10]);
Heading11:=Monthcalc(Harray[11]);
Heading12:=Monthcalc(Harray[12]);
Heading13:=Monthcalc(Harray[13]);
// showmessage(DateToStr(startdate));
// showmessage(DateToStr(enddate));
// Showmassage('test');
end;
Function Monthcalc(Amonth:integer):String;
begin
Monthname[1]:='Jan';
Monthname[2]:='Feb';
Monthname[3]:='Mar';
Monthname[4]:='Apr';
Monthname[5]:='May';
Monthname[6]:='Jun';
Monthname[7]:='Jul';
Monthname[8]:='Aug';
Monthname[9]:='Sep';
Monthname[10]:='Oct';
Monthname[11]:='Nov';
Monthname[12]:='Dec';
Monthname[13]:='LY';
Result:=Monthname[Amonth];
// showmessage(DateToStr(startdate));
// showmessage(DateToStr(enddate));
// Showmassage('test');
end;
May I first suggest that you change month name array to a constant:
const
MonthNames: array[1..12] of string = ('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec');
Next I suggest some changes to avoid a hardcoded time span of 13 months, because as you are now experiencing, requirements do change and hardcoded stuff is always harder to change than variable stuff.
Declare a variable e.g. MonthSpan: integer to indicate how many months should be included (in the future a new requirement of alternative timespans of 3 months, 6 months etc are possible). Also replace all Heading1, Heading2 ... variables with a dynamic array, named Headings:
Monthspan: integer;
Headings: array of string;
For now I just initialize these at the beginning
Monthspan := 13;
SetLength(Headings, Monthspan);
You have already previously calculated StartDate: TDate so using that and the existing variable M we can now write a simple loop for getting the headings:
M := MonthOf(StartDate);
for i := 0 to MonthSpan-1 do
begin
Headings[i] := MonthNames[(M + i) mod 12];
Memo1.Lines.Add(Headings[i]);
end;
MonthOf() is a function in unit System.DateUtils.
The above replaces Function MonthCalc(), and all of your code below the //Headings comment. Elsewhere in your code, where you used e.g. Heading1 you now use Headings[0] (dynamic arrays are always indexed from 0 up) etc.
I'm not sure what the purpose of HArray[] is, but it is not needed for the determination of the headings.

DbGeography - select polygons/linestrings contained inside polygon

I have a case where I want to load map objects located inside users viewport.
This is how I create users viewport rectangle:
DbGeography viewport_rectangle = DbGeography.FromText(string.Format("POLYGON(({0} {1}, {0} {2}, {3} {2}, {3} {1}, {0} {1}))", lon_min, lat_min, lat_max, lon_max));
Then I want to select all objects (Points, PolyLines, Polygons, located inside that rectangle):
var objects = db.mapobjects.Where(x => !x.LocationGeographic.Intersects(viewport_rectangle));
Everything until here works fine. The problem occours if PolyLine/Polygon is not entirely contained inside the viewport polygon. In this case it's ignored and I get "no objects" instead of objects, where some points/edges are out of the viewport.
Is there any alternative to "Intersects"? I want to select objects contained in the viewport rectangle, regardless if they are inside at all, or if only one small part of them is inside of the viewport rectangle.
viewport_rectangle = {SRID=4326;POLYGON ((15.693584159016611
46.532346466357438, 15.693584159016611 46.532770863495614, 15.695530101656916 46.532770863495614, 15.695530101656916 46.532346466357438, 15.693584159016611 46.532346466357438))}
Object which is only partially located inside viewport_rectangle and should be returned as a result:
LINESTRING (15.694189164787527 46.532622094224166, 15.694309193640944
46.532614944062828, 15.694392677396532 46.5326121762582, 15.694401059299702 46.532662919320614, 15.694536175578829 46.532621632923423, 15.694564338773485 46.532659690218026, 15.694584455341097 46.532614944062828, 15.694570373743769 46.532578039989573, 15.694489236921068 46.53258611275777, 15.694502312690016 46.532539290685662, 15.694723930209872 46.53252614359414, 15.69474438205361 46.532575041532539, 15.694786962121723 46.532516225610692, 15.694763492792843 46.532481858630774, 15.694699790328738 46.532507922181281, 15.694884862750767 46.532493852478581, 15.694849658757446 46.53254505695287)
One part of LINQ generated query:
SELECT
[Filter1].[ObjectId] AS [ObjectId],
[Filter1].[LocationGeographic] AS [LocationGeographic],
FROM (SELECT [Extent1].[ObjectId] AS [ObjectId], [Extent1].[LocationGeographic] AS [LocationGeographic]
FROM [dbo].[mapobjects] AS [Extent1]
WHERE (([Filter1].[LocationGeographic].STIntersects(#p__linq__0)) <> cast(1
as bit))
) AS [Project1]
Edited:
the correct order of viewport_rectangle should be:
DbGeography viewport_rectangle = DbGeography.FromText(string.Format("POLYGON(({0} {1}, {2} {1}, {2} {3}, {0} {3}, {0} {1}))", lon_min, lat_min, lon_max, lat_max));
You appear to have a ring orientation problem with your polygon. The order in which you specify your points matters. The polygon, as you've defined it, is the entire globe minus a very small square (presumably, your desired viewport). How did I determine this?
declare #line geography = geography::STGeomFromText('LINESTRING (15.694189164787527 46.532622094224166, 15.694309193640944 46.532614944062828, 15.694392677396532 46.5326121762582, 15.694401059299702 46.532662919320614, 15.694536175578829 46.532621632923423, 15.694564338773485 46.532659690218026, 15.694584455341097 46.532614944062828, 15.694570373743769 46.532578039989573, 15.694489236921068 46.53258611275777, 15.694502312690016 46.532539290685662, 15.694723930209872 46.53252614359414, 15.69474438205361 46.532575041532539, 15.694786962121723 46.532516225610692, 15.694763492792843 46.532481858630774, 15.694699790328738 46.532507922181281, 15.694884862750767 46.532493852478581, 15.694849658757446 46.53254505695287)', 4236),
#poly geography = geography::STGeomFromText('POLYGON ((15.693584159016611 46.532346466357438, 15.693584159016611 46.532770863495614, 15.695530101656916 46.532770863495614, 15.695530101656916 46.532346466357438, 15.693584159016611 46.532346466357438))', 4236);
select #poly.EnvelopeAngle(); --returns 180
select #poly.ReorientObject().STIntersects(#line); --returns 1
Best you read up on the EnvelopeAngle() method yourself. But I'll say this - I use it as a quick heuristic to detect the ring orientation problem you have here. Invariably, if a polygon has this problem, the envelope angle will be 180 (which is almost never what you intended).
I've also given away the punchline on how to fix it in the code above; calling ReorientObject() on the polygon changes clockwise to counterclockwise (and vice versa).
Finally, it looks like your line string was fully contained within your viewport; I tested with STContains(). Which explains why you were getting false before when what you thought was your viewport was everything but the viewport!

Resources