sampling error from multivariate skew normal distribution using sn package in r - sampling

I am trying to sample from skew normal distributions of following 4 variables (MAT - mean annual temperature, Tmin - minimum temperature, MAP - mean annual precipitation, and WQP - precipitation of warmest quarter) using sn package in r. And, their xi, omega, and alpha are as follows.
s_xi <- c(8.9, -1.5, 673.5, 202.7)
s_Omega <- matrix(c(2.1, 2.9, 76.3, -2.3,
2.9, 4.9, 9.4, 0.4,
76.3, 94.6, 22614.1, 2519.0,
-2.3, 0.37, 2519, 915.2), 4,4, byrow=TRUE)
s_alpha <- c(1.8, -4, 6.7, -3.6)
camp_sample2 <- rmsn(n=1000, xi=s_xi, Omega=s_Omega, alpha=s_alpha)
When I ran bivariate sampling of MAT and MAP, it worked fine with the same code. However, above code of 4 variables would not work. I get following error message.
camp_sample2 <- rmsn(n=1000, xi=s_xi, Omega=s_Omega, alpha=s_alpha)
Error in pd.solve(Omega) : x appears to be not symmetric
Can anyone interpret this error message? and help me with the correct coding?

Related

How can I obtain overall p-value in feglm model?

I am running on project to find associated factors of a certain blood test performed, let's says diabetes blood test for this post. The variables that I have are 1) year (2018, 2019, 2020), 2) gender (male, female, other), 3) clinic locations of individual clinics (metropolitan, regional, rural), 4) Age group (20-29,30-39,40-49, 50-59, 60-89yrs old). This data is clustered sample by medical clinic (clinic_id)
I tried survey, srvyr and fixest packages (cluster sample) and found that the of feglm of fixest package were very similar to those of stata.
I fit this model using fixest package using the following script;
model_tested <- feglm (tested ~ year + gender + clinic_location + age_group, data = tested_proportion, family = "binomial", se = "cluster", cluster= ~clinic_id)
I was able to obtain individual p-value like the followings;
pr (>
t
)
year2019
0.71101
year2020
0.00973
female
0.00000
other
0.08090
age20-29
0.00000
age30-39
0.00000
age40-49
0.39693
age50-59
0.00000
age60-80
0.00000
In glm, I can run Anova (or aov) test to obtain overall p-value of each variable, such as p-value of year, gender and age-group.
However, I cannot run anova(model_tested) and got error message that Anova test was not supported in feglm model.
I tried the following script to obtain overall-p value of each variable, using wald.test
p_overall_year <- aod:wald.test(sigma = vcov(model_tested), b= coef(model_tested), Term = 2:3)
p_overall_gender <- aod:wald.test(sigma = vcov(model_tested), b= coef(model_tested), Term = 4:5)
p_overall_gender <- aod:wald.test(sigma = vcov(model_tested), b= coef(model_tested), Term = 6:10)
My question is, are there better way to obtain overall-p values of each variable?
Also, These showed overall p-value of each group but it was somewhat different to those of stata that i obtained using script, testparm i(2018/2020).year, that showed results of adjusted wald test. For example, overall p-value of year in R was 0.0013 whereas that in Stata was 0.0891.
Any other methods that I can try in R to achieve similar overall p-value to Stata?

PostGIS Raster compute min, max altitude value and slope from a DEM

I have a DEM dataset and some polygons that represents parcels. For each parcels I would like to compute the maximum/minimum altitude and the average slope. Based on the PostGIS documentation and several example on Internet, two functions could be used to compute this data. The first one is ST_SummaryStatsAgg and the other is ST_DumpAsPolygons.
So, I've created a trigger that computes, before a new parcel is inserted, some statistics, but I am confused about the results. Here is my code:
--First cut the raster based on the parcel shape
SELECT INTO __rasterClip
ST_Union(ST_Clip(foncier.reunion_mnt.rast, NEW.geom, -9999, TRUE))
FROM foncier.reunion_mnt
WHERE NEW.geom && foncier.reunion_mnt.rast;
--Compute slope with ST_DumpAsPolygons OR ST_SummaryStatsAgg
SELECT INTO __slope1 (ST_DumpAsPolygons(ST_Slope(__rasterClip, 1, '32BF', 'DEGREES', 1.0))).val;
SELECT INTO __slope2 (ST_SummaryStatsAgg(ST_Slope(__rasterClip, 1, '32BF', 'DEGREES', 1.0), 1, TRUE, 1)).max;
RAISE NOTICE 'Slope1 %', MAX(__slope1 );
RAISE NOTICE 'Slope2 %', __slope2;
--Compute min/max altitude
SELECT INTO __rasterStats (ST_SummaryStatsAgg(__rasterClip, 1, TRUE, 1)).*;
SELECT INTO __polyDump (ST_DumpAsPolygons(__rasterClip, 1, TRUE)).*;
RAISE NOTICE 'Stat % - %', __rasterStats.min, __rasterStats.max;
RAISE NOTICE 'Poly % - %', Min( __polyDump.val ), Max( __polyDump.val );
The results of the RAISE NOTICE:
NOTICE: Slope1 5.14276456832886
NOTICE: Slope2 51.9147148132324
NOTICE: Stat 222.76 - 251.22
NOTICE: Poly 225.929992675781 - 225.929992675781
There is clearly something wrong. The slope between the two functions is not the same and the min and max altitude for the ST_DumpAsPolygons is the same.
So could you please help me and tell me:
What is the most effective way to compute the min/max altitude and the average slope for a parcel based on a raster DEM?
For my general knowledge is it best to use ST_SummaryStatsAgg or ST_DumpAsPolygons. In which case is it best to use on or the other?
In a trigger how to declare the variable type of these two functions (ST_SummaryStatsAgg, ST_DumpAsPolygons). My first attempt was to declare them using their return type (summarystats and geomval). But I was getting errors so I switch to Record. Is it correct?
Thanks for your help!

Error in Matrices concatenation, inconsistent dimensions

%% INPUT DATA
input_data = [200 10.0 0.0095; %C1
240 7.0 0.0070; %C2
200 11.0 0.0090; %C3
220 8.5 0.0090; %C4
220 10.5 0.0080; %C5
0.0015 0.0014 -0.0001 0.0009 -0.0004 %Power Loss
];
pd=830; %Power Demand
%%
lambda = input ('Enter initial lambda:')
Could anyone help me fix this? I've check the row to column data but still cant fix the error.
Your 6th row, power, contains 5 entries, as opposed to c1 to c5, which contain only three entries. MATLAB doesn't do Swiss cheese; I'd suggest making power a separate variable.

loading, editing and writing multiple data sets in R

I am relatively new to R, but have so far managed through effective googling. Unfortunately, I am having a hard to resolving my current problems with google.
I have a large number of files which I would like to edit, and save as a different format. The tables contain latitude, longitude and mean temperature data in ~80,000 rows. What I need to do is select only the regions and months germane to my study. While I can do this easily enough for a single file, I cannot seem to automate the process for the 111 files I have (I would like to keep them separate, as it will be easier for downstream GIS applications).
For a single file, this is the process that works for me:
test<-read.fwf("air_temp.1900", widths = c(8,8,8,8,8,8,8,8,8,8,8,8,8,8), header=F)
test1<-subset(test, V1 > -135 & V1< -55 & V2 < 80 & V2 > 55,
select=c("V1","V2","V8","V9","V10"))
write.csv(test1.csv,file="test1",row.names=F)
Here is an example of the data structure (the first two columns correspond to longitude and latitude, and the rest mean monthly temperatures Jan-Dec):
> test[1:3,]
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14
1 -179.75 71.25 -26.7 -19.5 -22.5 -22.3 -8.0 -0.6 2.5 0.3 -2.6 -9.6 -12.6 -23.6
2 -179.75 68.75 -28.5 -21.3 -24.4 -24.4 -8.0 0.0 4.0 0.8 -2.8 -11.1 -14.1 -26.5
3 -179.75 68.25 -29.2 -22.0 -25.2 -25.1 -8.9 -0.1 3.9 0.7 -3.4 -12.0 -14.9 -27.5
Here is my attempt at automating the process, though it is obviously flawed:
names<-list.files(pattern='air_temp.*')
names1<-substr(names,1,13)
for(i in names2){
filepath <- file.path("...Climate.geo.udel.edu/",paste(i))
assign(i, read.fwf(filepath, widths = c(8,8,8,8,8,8,8,8,8,8,8,8,8,8), header=F))
#up to here works fine, I can automate the loading of these files into R
#but editing and exporting them doesn't seem to work
subset(i, V1 > -135 & V1 < -55 & V2 < 80 & V2 > 55, select=c(V1,V2,V8,V9,V10))
}
Then the process fails with the following message:
"Error in subset.default(i, V1 > -135 & V1 < -55 & V2 < 80 & V2 > 55, select = c(V1, :
object 'V1' not found"
I can infer that "i" is not the correct object, but I cannot seem to figure out what I am supposed to put there.
I haven't even begun trying to automate the write.csv portion, so any advice on that regard would be greatly appreciated.
Thank you in advance,
Your error is saying that there's no variable named V1. This is true. read.fwf returns a data.frame. V1 is a column in the data.frame held by i or test.
To use the columns of a data.frame, you can either use the with function or use the $ syntax to access the columns in the data.frame.
The with syntax is probably simpler in your case:
subset(i, with(i, V1 > -135 & V1 < -55 & V2 < 80 & V2 > 55, select=c(V1,V2,V8,V9,V10)))
To access each column from the variable directly, you would use this syntax:
i$V1

Averaging data for points in close proximity with SQL Server 2008

I have an application which receives GPS data from a mobile device as well as receiving co-ordinate data it also provides signal strength from the GSM network.
I am trying to plot the points on a map to display areas of good signal strength and areas of poor signal strength.
When I have a few points it all works well, the points are retrieved from the database and a square is built around the point with the top left corner 0.5km from the point. I then display the square shapes on the VE map using colour coding for signal strength.
The problem is that there may be thousands and thousands of readings and I need a way to average out those readings that are less than 0.5km from each other or I need to build the square (or circle perhaps) in SQL Server and average out the intersections.
I have no idea where to begin with this so any pointers to decent articles or some tips would be much appreciated.
Thanks.
One simple and somewhat inaccurate way to do this would be to decrease the granularity of your data. It might not even be inaccurate, depending on how accurate your x, y measurements are.
let's say we have the following data:
x y signal_strenth
10.2 5.1 10
10.1 5.3 12
10.3 5.5 8
If we floor the x and y values, we get:
x y signal_strenth
10 5 10
10 5 12
10 5 9
Then we can average those values by the floored x and y to show that we have average signal strength in the rectangle (10, 5) to (11, 6).
Here's the SQL:
select
floor(x) as rectangle_xmin,
floor(y) as rectangle_ymin,
floor(x) + 1 as rectangle_xmax,
floor(y) + 1 as rectangle_ymax,
avg(signal_strength) as signal_strength
from table
group by floor(x), floor(y);
Now, admittedly, you'd ideally want to group data points by distance from point to point, and this groups them by a maximum distance that varies from 1 and to square_root(2) =~1.44, flooring them into rectangular blocks. So it's less than ideal. But it may work well enough for you, especially if the flooring/grouping is less than the error in your measurement of position.
If floor() is not granular enough, you can use floor( x * someweight) / someweight to adjust it to the granularity you want. And of course you can use ceil() or round() to do the same thing.
The whole point is to collapse a bunch of nearby measurements to one "measurement", and then take the average of the collapsed values.
You might want to look into Delaunay Triangulation where you can plot X,Y,Z coordinates into a graph. It might be possible, not knowing exactly what you have for points, to use X,Y for the location and then plot the Z as signal strength and create a spike graph. I've only seen c++ examples CodePlex sample but it might be something you can write a SQL function for.
SELECT
geography::STPointFromText('POINT(' + CONVERT(varchar, AvgSignalReadings.rect_lngmin / 100) + ' ' + CONVERT(varchar, AvgSignalReadings.rect_latmin / 100) + ')', 4326) as Location,
AvgSignalReadings.lat / 100 as Latitude,
AvgSignalReadings.lng / 100 as Longitude,
AvgSignalReadings.SignalStrength
FROM
(
SELECT
FLOOR(l.Latitude * 100) as lat,
FLOOR(l.Longitude * 100) as lng,
AVG(l.SignalStrength) as SignalStrength,
COUNT(*) as NumberOfReadings
FROM SignalLog l
WHERE l.SignalStrength IS NOT NULL AND l.SignalStrength <> 0 AND l.Location IS NOT NULL
AND l.[Timestamp] > DATEADD(month, -1, GETDATE())
GROUP BY FLOOR(l.Latitude * 100), FLOOR(l.Longitude * 100))
AS AvgSignalReadings

Resources