Extracting numbers from a series of text files - file

I have 12 files all with the same format:
Statistics Information
Line: 4
Fiducial range: 156 to 364
Number of items: 209
Number of dummies: 0
Minimum value: -0.08983668870989447
Maximum value: 0.059795797205623558
Mean value: -0.00884060126461031
Standard deviation: 0.03707261357656038
Arithmetic sum: -1.8476856643035546
Each file is for a manoeuvre in a specific direction, North (Pitch, roll, Yaw), South (Pitch, roll, Yaw), East (Pitch, roll, Yaw) and West (Pitch, roll, Yaw).
I want to cycle through each of these text files and store the number for each minimum, maximum and mean value for each file. Then export them in a table:
NORTH
Pitch
Roll
Yaw
Min
-0.08983668870989447
Max
0.059795797205623558
Mean
-0.00884060126461031
South
Pitch
Roll
Yaw
Min
Max
Mean
et cetera
So far I have managed to list the different files and then extract the first line:
import glob
txt_files = glob.glob("*.txt")
def read_first_line(txt_files):
with open(txt_files, 'rt') as fd:
first_line = fd.readline()
return first_line
output_strings = map(read_first_line, txt_files) # apply read first line function all text files
print(txt_files)
output_content = "".join(sorted(output_strings))
output_content # as a string
print(output_content) # print as formatted
with open('outfile.txt', 'wt') as fd:
fd.write(output_content)

Related

How to sampling Data Frame?

The goal is to subsample a data frame.
code:
# 1 date is in type datatime
dg.Yr_Mo_Dy = pd.to_datetime(dg.Yr_Mo_Dy, format='%Y%m%d')
# 2 date is in index
dg = dg.set_index(dg.Yr_Mo_Dy, drop = True)
# 3 to group by 10
dg.resample('1AS').mean().mean()
That gives:
RPT 14.847325
VAL 12.914560
ROS 13.299624
KIL 7.199498
SHA 11.667734
BIR 8.054839
DUB 11.819355
CLA 9.512047
MUL 9.543208
CLO 10.053566
BEL 14.550520
MAL 18.028763
dtype: float6
The code takes every 10 values the 10 intermediate values and the average.
Similarly, it is also possible to sum these 10 values by replacing mean() with sum().
However, what I want to do is not an average but a sampling. That is, to take all the values and only one without averaging, without summing the intermediate values.
For example, the data: 1,2,3,4,5,6.. sampled by 0.5 gives 2,4,6... et non 1.5,2.5,3.5,5.5...

Extraction of matched dataset from MatchThem

I have browsed almost all possible pages on the subject and I still can't find a way to extract a matched data dataset with the MatchThem package.
By analogy, MatchIt allows via the function match.data() to extract the dataset of matched data for example 3:1. Although MatchThem's complete() function is the equivalent, this function apparently does not allow to extract exclusively the imputed AND matched dataset.
Here is an example of multiple imputation with 3:1 matching from which I am trying to extract multiple matched datasets:
library(mice)
library(MatchThem)
#Multiple imputations
mids_object <- mice(data, maxit = 5, m=3, seed= 20211022, printFlag = F) # m=3 is voluntarily low for this example.
#Matching
mimids_object <- matchthem(primary_subtype ~ age + bmi + ps, data = mids_object, approach = "within" ,ratio= 3, method = "optimal")
#Details of matched data
print(mimids_object)
Printing | dataset: #1
A matchit object
method: Variable ratio 3:1 optimal pair matching
distance: Propensity score
- estimated with logistic regression
number of obs: 761 (original), 177 (matched)
target estimand: ATT
covariates: age, bmi, ps
#Extracting matched dataset
complete(mimids_object, action = "long") -> complete_mi_matched
#Summary of extracted dataset to check correct number of match
summary(complete_mi_matched$primary_subtype)
classic ADK SRC
702 59
It should show the matched proportion 3:1 with 177 matched (177 classic ADK and 59 SRC)
I am missing something. Thanks in advance for your help or suggestions.

To print follow like this from text array

Print, for each data set that is input, the average to four decimal places. This average should be preceded each time by “For Competitor #X, the average score is ”, where X denotes the competitor’s position (starting with 1) in the input file.
Output to screen for above input file:
For Competitor #1, the average is 5.8625
For Competitor #2, the average is 0.0000
For Competitor #3, the average is 1.0000
file:///C:/Users/tram/Downloads/gym.PNG
go this link this is my code and it printed.
For Competitor #0, the average is 0
For Competitor #0, the average is 0
For Competitor #0, the average is 0
You don't mention in which language do you want it.
In javascript you have:
var num=5.11234123
num.toFixed(4) // "5.1123"

Gnuplot: import x-axis from file

I have two files 'results.dat' and 'grid.dat'.
The results.dat contains per row a different data set of y values.
1 325.5 875.4 658.7 365.5
2 587.5 987.5 478.6 658.5
3 987.1 542.6 986.2 458.7
The grid.dat contains the corresponding x values.
1 100.0 200.0 300.0 400.0
How can I plot with gnuplot the grid.dat as x values und a specific line of results.dat as corresponding y values? E.g. line 3:
1 100.0 987.1
2 200.0 542.6
3 300.0 986.2
4 400.0 458.7
Thanks in advance.
Thats quite similar to the recent question Gnuplot: plotting the maximum of two files. In your case it is also not possible to do it with gnuplot only.
You need an external tool to combine the two files on-the-fly, e.g. with the following python script (any other tool would also do):
""" selectrow.py: Select a row from 'results.dat' and merge with 'grid.dat'."""
import numpy as np
import sys
line = int(sys.argv[1])
A = np.loadtxt('grid.dat')
B = np.loadtxt('results.dat', skiprows=(line-1))[0]
np.savetxt(sys.stdout, np.c_[A, B], delimiter='\t')
And then plot the third line of results.dat with
plot '< python selectrow.py 3' w l

how to calculate rolling volatility

I am trying to design a function that will calculate 30 day rolling volatility.
I have a file with 3 columns: date, and daily returns for 2 stocks.
How can I do this? I have a problem in summing the first 30 entries to get my vol.
Edit:
So it will read an excel file, with 3 columns: a date, and daily returns.
daily.ret = read.csv("abc.csv")
e.g. date stock1 stock2
01/01/2000 0.01 0.02
etc etc, with years of data. I want to calculate rolling 30 day annualised vol.
This is my function:
calc_30day_vol = function()
{
stock1 = abc$stock1^2
stock2 = abc$stock1^2
j = 30
approx_days_in_year = length(abc$stock1)/10
vol_1 = 1: length(a1)
vol_2 = 1: length(a2)
for (i in 1 : length(a1))
{
vol_1[j] = sqrt( (approx_days_in_year / 30 ) * rowSums(a1[i:j])
vol_2[j] = sqrt( (approx_days_in_year / 30 ) * rowSums(a2[i:j])
j = j + 1
}
}
So stock1, and stock 2 are the squared daily returns from the excel file, needed to calculate vol. Entries 1-30 for vol_1 and vol_2 are empty since we are calculating 30 day vol. I am trying to use the rowSums function to sum the squared daily returns for the first 30 entries, and then move down the index for each iteration.
So from day 1-30, day 2-31, day 3-32, etc, hence why I have defined "j".
I'm new at R, so apologies if this sounds rather silly.
This should get you started.
First I have to create some data that look like you describe
library(quantmod)
getSymbols(c("SPY", "DIA"), src='yahoo')
m <- merge(ROC(Ad(SPY)), ROC(Ad(DIA)), all=FALSE)[-1, ]
dat <- data.frame(date=format(index(m), "%m/%d/%Y"), coredata(m))
tmpfile <- tempfile()
write.csv(dat, file=tmpfile, row.names=FALSE)
Now I have a csv with data in your very specific format.
Use read.zoo to read csv and then convert to an xts object (there are lots of ways to read data into R. See R Data Import/Export)
r <- as.xts(read.zoo(tmpfile, sep=",", header=TRUE, format="%m/%d/%Y"))
# each column of r has daily log returns for a stock price series
# use `apply` to apply a function to each column.
vols.mat <- apply(r, 2, function(x) {
#use rolling 30 day window to calculate standard deviation.
#annualize by multiplying by square root of time
runSD(x, n=30) * sqrt(252)
})
#`apply` returns a `matrix`; `reclass` to `xts`
vols.xts <- reclass(vols.mat, r) #class as `xts` using attributes of `r`
tail(vols.xts)
# SPY.Adjusted DIA.Adjusted
#2012-06-22 0.1775730 0.1608266
#2012-06-25 0.1832145 0.1640912
#2012-06-26 0.1813581 0.1621459
#2012-06-27 0.1825636 0.1629997
#2012-06-28 0.1824120 0.1630481
#2012-06-29 0.1898351 0.1689990
#Clean-up
unlink(tmpfile)

Resources