Basic interpretation of two table operations (row wise/not row wise) in Postgis - postgis

I'm new in PostGIS, I has been reading the docs, usually the docs are very good written, at least for tables of 1 row D:
Probs this will be a silly question, or obvs too all ones that know postgis, but plis help a little to can go inside from other languages.
I have checked a lot from:
https://postgis.net/workshops/postgis-intro/
Sadly, I still can't get an answer for a simple question, the behavior of a lot of functions in table-table operations.
I know R/sf, and I'm trying to learn Postgis, but usually, every function have its own way to relate the functions, as example, IIRC intersects exists in sf and geopandas, but..., the behavior of the function is different, even when they have the same name.
Lets pick an example:
https://postgis.net/docs/ST_Intersects.html
The function is defined as:
boolean ST_Intersects( geometry geomA , geometry geomB );
All the params are defined a geometry, that means it can be a column or a singular geometry, but we don't know what will be the behavior if the tables has more than 1 row, maybe when it says "geometries" will interpret the full table as one big geometry.
Then I can go to this link:
https://postgis.net/docs/geometry_overlaps.html
Where I can finally see a result that seems a matrix operation..., at some extent, here is where the possibilities starts to open.
Intersects is a row wise function?
Intersects will intersects every from from the first table over the second table? in case how would be the return...? (need a table of rows(table1)*rows(table2), this is not written in the docs)
Here above, are just the questions and what is confusing, checking intersects, now lets go back to the specific issue.
Probably, the relation of the functions is a common sense in postgis, because in the doc that is omitted, and not only in intersects in others like intersection, disjoint, etc. I think all of them has the same behavior, is just implicit.
So, postgis works in a element-element? table-element? element-table? table-table? or other interpretation? or every function have its own way but is not written or I need search on other place?
Thx!

Related

Reordering dataset in stata

We have a dataset of 1222x20 in Stata.
There are 611 individuals, such that each individual is in 2 rows of the dataset. There is only one variable of interest in each second row of each individual that we would like to use.
This means that we want a dataset of 611x21 that we need for our analysis.
It might also help if we could discard each odd/even row, and merge it later.
However, my Stata skills let me down at this point and I hope someone can help us.
Maybe someone knows any command or menu option that we might give a try.
If someone knows such a code, the individuals are characterized by the variable rescode, and the variable of interest on the second row is called enterprise.
Below, the head of our dataset is given. There is a binary time variable followup, where we want to regress the enterprise(yes/no) as dependent variable at time followup = followup onto enterprise as independent variable at time followup = baseline
We have tried something like this:
reg enterprise(if followup="Folowup") i.aimag group loan_baseline eduvoc edusec age16 under16 marr_cohab age age_sq buddhist hahl sep_f nov_f enterprise(if followup ="Baseline"), vce(cluster soum)
followup is a numeric variable with value labels, as its colouring in the Data Editor makes clear, so you can't test its values directly for equality or inequality with literal strings. (And if you could, the match needs to be exact, as Folowup would not be read as implying Followup.)
There is a syntax for identifying observations by value labels: see [U] 13.11 in the pdf documentation or https://www.stata-journal.com/article.html?article=dm0009.
However, it is usually easiest just to use the numeric value underneath the value label. So if the variable followup had numeric values 0 and 1, you would test for equality with 0 or 1.
You must use == not = for testing for equality here:
... if followup == 1
For any future Stata questions, please see the Stata tag wiki for detailed advice on how to present data. Screenshots are usually difficult to read and impossible to copy, and leave many details about the data obscure.

how to use pulp to generate variables and constraints of sparse matrix?

there,
I am new to pulp. I learn pulp from some examples I got online. These examples are very helpful and now I am able to write simple models by mtself. But I still feel difficult to build complex model, especailly model with sparse matrix.
Could you please kindly post with some complex examples with sparse matrix, and conplex constraints. I want to learn how to create necessary variables only, instead of simple one, such as, y = LpVariable.dicts("y", (Factorys, Customers) ,0,1,LpBinary).
I have another question: What happen if I simply use y = LpVariable.dicts("y", (Factorys, Customers) ,0,1,LpBinary) to define variables, in which most of variables are useless in model objective function and constraints, and I add some constraints to explicitly set such useless variable to 0? Does pulp algorithm is able to firstly identify such uesless variables and remove them first, then run Integer Programming algorithm (such as B&B or B&C) to solve the problem with reduced size? If this is true, It looks the "setting useless variable to 0" method will not decrease the solution speed at all. Am I right?
This may help
http://www.stuartmitchell.com/journal/2012/2/3/my-top-n-tips-for-python-coding-in-optimisation-1.html
In particular generate a set of of factories and customers first that is sparse.
factories_customers = [(f,c) for f in factories for c in customers
if <insert your condition here>]
Then use
y = LpVariable.dicts("y", factories_customers ,0,1,LpBinary)
Pulp does not remove "useless" variables and constraints so the model build time will be long.
However, the solution algorithms (CBC by default contain pre-solve algorithms that will remove the variables).

Does anyone know of potential problems with st_line_substring in postGIS?

Specifically I'm getting a result that I do not understand. It is possible that my understanding is simply wrong, but I don't think so. So I'm hoping that someone will either say "yes, that's a known problem" or "no, it is working correct and here is why your understanding is wrong".
Here is my example.
To start I have the following geometry of lat/longs.
LINESTRING(-1.32007599 51.06707497,-1.31192207 51.09430508,-1.30926132 51.10206677,-1.30376816 51.11133597,-1.29261017 51.12981493,-1.27510071 51.15906713,-1.27057314 51.16440941,-1.26606703 51.16897072,-1.26235485 51.17439257,-1.26089573 51.17875111,-1.26044512 51.1833917,-1.25793457 51.19727033,-1.25669003 51.20141159,-1.25347137 51.20630532,-1.24845028 51.21110444,-1.23325825 51.22457158,-1.2274003 51.22821321,-1.22038364 51.23103494,-1.20326042 51.23596583,-1.1776185 51.24346193,-1.16356373 51.24968088,-1.13167763 51.26363353,-1.12247229 51.2659966,-1.11629248 51.26682901,-1.10906124 51.26728549,-1.09052181 51.26823871,-1.08522177 51.26885628,-1.07013702 51.27070895,-1.03683472 51.27350122,-1.00917578 51.27572955,-0.98243952 51.2779175,-0.9509182 51.28095094,-0.9267354 51.28305811,-0.90499878 51.28511151,-0.86051702 51.2883055,-0.83661318 51.29023789,-0.7534647 51.29708113,-0.74908733 51.29795323,-0.7400322 51.2988924,-0.71535587 51.30125366,-0.68475723 51.29863749,-0.65746307 51.30220618,-0.63246489 51.30380261,-0.60542822 51.30645873,-0.58150291 51.3103219,-0.57603121 51.31150225,-0.57062387 51.31317883,-0.54195642 51.32475227,-0.4855442 51.34771616,-0.4553318 51.36283147)
This is in a column called "geom" in my table, called "fibre_lines". When I run the following query,
select st_length(geography(geom), false) as full_length,
st_length(geography(st_line_substring(geom, 0, 1)), false) as full_length_2,
st_length(geography(st_line_substring(geom, 0, 0.5)), false) as first_half,
st_length(geography(st_line_substring(geom, 0.5, 1)), false) as second_half
from fibre_lines
where id = 10;
I get the following result...
76399.4939375278 76399.4939375278 41008.9667229201 35390.5272197668
The first two make sense to me, they are simply the length of my line assuming a spherical earth. The first is just using the obvious function while the second is using st_line_substring to get the length of the entire line. These two values agree.
But the last two have me puzzled. I am asking for the length of the first half of the line, then I'm asking for the length of the last half. My expectation was that these would be equal or nearly equal. Instead the first half is about 6km longer than the second half.
If you plot the geometry on the map you will see that the first third of the line is fairly north/south oriented and the remaining two thirds are more east/west. I wouldn't have thought that would make a difference when asking for the length on a spherical earth, but I am happy to be told that I'm wrong (so long as it is also explained why I'm wrong).
For reference the PostGIS I am using is 1.5.8. If this is a bug, upgrading to a newer version is possible, but not trivial, so I would prefer to only do that if it is necessary.
Anyone have ideas?
While Arunas' comments didn't directly answer my question, it did lead me to some research that I think identifies the problem. I'm posting it here in part to get it straight in my own mind and in part in case others are wondering.
It seems the key is the PostGIS distinction between a "geometry" and a "geography". A geometry is a 2D planar geometry that is typically in UTMs and used with a projection of the globe onto a flat surface (which projection is configurable). A geography, on the other hand, is designed to store latitude/longitude information specifically and is used to work either on a sphere or a spheroid. So the essential problem I have is twofold:
Perhaps not obvious from my original post is that I am using a geometry object to store lat/long information rather than UTMs. I cast that to a geography most of the time so that I get the correct answers, but it would be more correct if I actually stored it as a geography object. That would eliminate the need for a number of the casts in my code as well as allow PostGIS to tell me when I am doing something wrong.
While ST_Length will work with either a geometry or a geography, ST_Line_Substring only works with geometries. Hence when I ask it for the halfway point, I am asking it for the halfway point of a flat geometry. This will give me the correct answer for the latitude coordinate, but for the longitude it will have an error term that increases (for most projections) the farther I am from the equator.
I've looked into newer versions of PostGIS and they don't seem to have an ST_Line_Substring or anything similar that will give me the 50% point of a geography, so I will have to do it the "hard" way by using ST_Length to give me all my segment lengths and then adding them up and doing the math needed for my interpolation.
Sorry I can't add comments so will provide it as an answer.
I experienced the same problem and I resolved by transforming my lat-lon geometries to utm geometries into st_line_substring function call. The I as getting sub-geometries with proper length. Of course I had to transform them back to lat-lon afterward.

Issue with OpenStreetMap administration area rendering

I imported latest OpenStreetMap data to local SQLServer db and generated "outer" shape for some administration area. Original area shape is available here:
http://www.openstreetmap.org/relation/2658573
What I received is below:
What I did is I took "outer" way definitions for this shape (mentioned above) in the order defined in the db and constructed POLYGON definition from points belonging to all the ways in the order defined in OpenStreetMap db.
To import data to SQLServer db I used OsmSharp.core nuget package.
The orientation of shape data is clockwise as per info on this page(see Naming and Orientation section)):
http://wiki.openstreetmap.org/wiki/Talk:Relation:multipolygon
My questions are:
Does anybody know why taking direct definitions for "ways"/lines describing some area shape doesn't work as it should?
Should I reject some points from line/way definitions?
Maybe I just missed something else...
Edit 1:
I did quick query to verify info from scai and it looks promising:
NodeNr column shows which point on the way/line definition I should take as the start/endpoint for polygon definition. WayNr shows which consecutive way shares point with which one.
Edit 2:
I tested data related to way definitions already and it looks the main issue is with order of nodes in the way definitions only. All the tested ways used all the nodes belonging to them (no any redundant points were there in case the way could cross some other way in the middle of that way).
Some sample data:
The definition of the way with nr 42 (WayNr = 42) shows that start point of this line (continuation of the way nr 41) is on position 44. It means point/node 44 in the definition of the way 42 has the same gps/coordinates as point on position 30 in the definition of way nr 41.
After taking such "reverted" lines in the correct order I received the shape shown below:
After adding an inner area (exclusion) it looks finally as the original one shape:
The other issue with OpenStreetMap data is it uses right-hand rule to define outer-ring of the shape (clockwise instead of counter-clockwise) so in case of SQLServer we have to use ReorientObject() method on Geography instance. To detect orientation of the shape you can use EnvelopeAngle() method on Geography object for instance.
Edit 3:
I received an answer on OSM forum the order of ways on relation members doesn't matter (we can't rely on it) so ways have to be reordered before any further processing yet...
There are already various similar questions here at stackoverflow and at gis.stackexchange.com.
Relations don't necessarily contain the ways in the correct order. Thus it will be necessary to order the ways yourself. Take a look at the ID of the last node of a way and search for the way who has the same ID for its start node. If you do this for all ways then the resulting geometry should be fine.

Solving the problem of finding parts which work well with each other

I have a database of items. They are for cars and similar parts (eg cam/pistons) work better than others in different combinations (eg one product will work well with another, while another combination of 2 parts may not).
There are so many possible permutations, what solutions apply to this problem?
So far, I feel that these are possible approaches (Where I have question marks, something tells me these are solutions but I am not 100% confident they are).
Neural networks (?)
Collection-based approach (selection of parts in a collection for cam, and likewise for pistons in another collection, all work well with each other)
Business rules engine (?)
What are good ways to tackle this sort of problem?
Thanks
The answer largely depends on how do you calculate 'works better'?
1) Independent values
Assuming that 'works better' function f of x combination of items x=(a,b,c,d,...) and(!) that there are no regularities that can be used to decide if f(x') is bigger or smaller then f(x) knowing only x, f(x) and x' (which could allow to find the xmax faster) you will have to calculate f for all combinations at least once.
Once you calculate it for all combinations you can sort. If you will need to look up data in a partitioned way, using SQL/RDBMS might be a good approach (for example, finding top 5 best solutions but without such and such part).
For extra points after calculating all of the results and storing them you could analyze them statistically and try to establish patterns
2) Dependent values
If you can establish some regularities (and maybe you can) regarding the values the search for the max value can be simplified and speeded up.
For example if you know that function that you try to maximize is linear combination of all the parameters then you could look into linear programming
If it is not...

Resources