Converting geometry to json in PostGIS creates extra rows - postgis

I want to plot municipal borders in Apache Superset. Superset provides maps and provinces using the ISO-3166-2 standard, but I would like to plot municipalities as wel. There is a shape file with these borders and this is converted to a PostGIS table following instructions on this site. I used shp2pgsql to add the shape file to the PostGIS database. That resulted in a table containing some identifying informatie and a binary gemometry column.
The first row of the table is 231.375 characters long and looks like this (abbreviated):
"gid" "gm_code" "gm_naam" "shape_leng" "shape_area" "geom"
1 "GM0034" "Almere" 122665.358635 109562253.490 "0106000020E61000000300000001030000000100000017000000806CE7FB2F560241408B6CE73E441D41801C5A643A520...
This information is not readable by Superset so I converted it to geojson by this query:
CREATE TABLE gis.gemeente_2021_json AS
SELECT gm_code, gm_naam, shape_area,
json_build_object(
'type', 'Polygon',
'geometry', ST_AsGeoJSON(ST_Transform((ST_DUMP(geom)).
geom::geometry(Polygon, 4326), 4326))::json)::text as geojson
FROM gis.gemeente_2021_v1;
What happens is that I started with 435 municipalities but my new table has 1140 entries.
Some sample output is shown below. The lines are very long so I cut the lines and replaced this by ellipses (...). As you can see municipality "GM0034" has suddenly 3 rows with different entries in the gejson column while I expected just one row with a very long Geojson string.
gm_code gm_naam shape_area geojson
GM0034 Almere 109562253.49 {"type" : "Polygon", "geometry" : {"type":"Polygon","coordinates":[[[150213.998,479503.726],[150087.298999999,479382.379000001],[150000.420000002,479461.258000001],[150000.354600001,479461.317400001],[150000.366300002,479461.327300001],[150001.45630,...]]]}}
GM0034 Almere 109562253.49 {"type" : "Polygon", "geometry" : {"type":"Polygon","coordinates":[[[141872.969,483192.398800001],[141872.978100002,483192.398499999],[141872.984099999,483192.398499999],[141904.523899999,483191.793400001],[141912.174600001,483191.646699998],[141912.340999998, ...
GM0034 Almere 109562253.49 {"type" : "Polygon", "geometry" : {"type":"Polygon","coordinates":[[[144312.481800001,492971.460499998],[144312.557,492971.443999998],[144312.633299999,492971.445799999],[144316.953000002,492971.539999999],[144321.4701,492972.263300002],[144321.730999999,492972.305], ...
The data may be Multipolygon though the documentation mentions Polygon. I tried that but that basically yielded the same results.
Checked the Postgres documentation on maximum linelength but that is about 1GB of text per occurrence and although the lines are long, 250K can be easily handled by postgres.
Any suggestion to where I should look further is welcome.
Edit 1
As #JGH rightfully pointed out I'd made quite a mistake in the SRID's. The shape file's SRID is 28992. I deleted the table, created it anew with the correct SRID. It displayed correctly on openstreetmap. It is converted as follows:
CREATE TABLE gis.gemeente_2021_json AS
SELECT gm_code, gm_naam, shape_area,
json_build_object(
'type', 'Polygon',
'geometry', ST_AsGeoJSON(ST_Transform((ST_DUMP(geom)).
geom::geometry(Polygon, 28992), 4326))::json)::text as geojson
FROM gis.gemeente_2021_v1;
The same error applies, alas. Still 3 rows for GM0034.
Edit 2
I adjusted the query according to the suggestions of #JGH:
CREATE TABLE gis.gemeente_2021_json AS
SELECT gm_code, gm_naam, shape_area,
json_build_object(
'type', 'MultiPolygon',
'geometry', ST_AsGeoJSON(ST_Transform(
geom::geometry(MultiPolygon, 28992), 4326))::json)::text as geojson
FROM gis.gemeente_2021_v1;
and that worked.

The data show coordinates like [150213.998,479503.726], yet the code contains the line geom::geometry(Polygon, 4326) followed by a useless transform to 4326. The shown coordinates are not in 4326, they have probably been loaded without a defined CRS (so 0 is assigned and the cast works).
The data is therefore declared to be in lat-long 4326 but contains values for another CRS, you end up with artistic coordinates which can't be properly handled (maybe it goes several times around the earth, maybe it goes to the pole and back etc... anything can happen). Garbage in, garbage out.
So the first step is to set the proper CRS. Then you can likely apply a -usefull- transform from this other CRS to 4326. If you want single parts, keep using the dump and handle the attributes repetions. If you want multi-parts, remove the dump and set the geojson type to MultiPolygon.

Related

No other way than dropping the duplicates, if ValueError: Index contains duplicate entries, cannot reshape?

enter image description here
Hi everyone, this is my first question.
I'm working on a dataset from patients who undergone urine analysis.
Every row refer to a single Patient Id and every Request ID could refer to different types of urine analysis (aspect, colour, number of erythrocytes, bacteria and go on).
I've add an image to let you understand my dataset.
I'd like to reshape making one request = one row , with all the tests done in the same request on the same row.
After that I want to merge with another df, that I reshape by Request ID (cause the first was missing a "long result" column, that I downloaded from another software in use in our Hospital).
I've tried:
df_pivot = df.pivot(index='Id Richiesta', columns = 'Nome Analisi Elementare', values = 'Risultato')
df_pivot.reset_index(inplace=True)
After I want to do --> df_merge = pd.merge (df_pivot,df,how='left', on='Id Richiesta')
I've tried once with another dataset, but I had to drop_duplicates for other purpose, and it worked.
But this time I have to analyse all the features.
How can I do? Is there no other way than dropping the duplicates?
Thank you for any help! :)
I've studied more my data and discovered 1 duplicate of bacteria for the same id request (1 in almost 8 million entries....)
df.drop_duplicates[df[['Id Richiesta', 'Id Analisi Elementare', 'Risultato']].duplicated()]
Then visualized all the rows referring at the "Id Richiesta" and the keep last (they were the same).
Thank you and sorry.
Please, tell me if I had to delete this question.

Matching and replacing a selection of data from two different dataframes

(First time posting so please bear with) I have two different dataframes, one of which contains a column of replacement data for a selection of data within the first dataframe.
#dataframe 1
df<-data.frame(site= rep(1:4,3), landings = rep("val",12),
harbour = c("a","b","c","d","e","f","g","h","i","j","k","l"))
#dataframe 2
new_site4<-data.frame(harbour = c("a","b","c","d","e","f","g","h","i","j","k","l"),
sub_site = c("x","x","y","x","y","y","y","x","y","x","y","y") )
I want to replace the "site" in dataframe 1 with the "subsite" in dataframe 2 based on the match of "harbour" however I only need to do it for records for site "4".
Is there a neat way to select only site 4 and then replace the site number with the subsite, ideally without merging or without creating a whole new dataframe. My real dataset is large but the key is only small as it only refers to a small selection of the data which needs the subsite added.
I tried using match() on my main dataset but for some reason it only matched some of the required data not all of it, but this code wont work on my sample data either.
#df$site[match(df$harbour, new_site4$harbour)] <- new_site4$sub_site[match(df$harbour, df$harbour)]`

How to query specific information from a JSON column?

I have searched Stack Overflow to get an answer to my question, but while I found many interesting cases, none of them quite address mine.
I have a column called fields in my data, that contains JSON information, such as presented below:
Row Fields
1 [{"label":"Label 1","key":"label_1","description":"Value of label_1"},{"label":"Label 2","key":"label_2","error":"Something"}]
2 [{"description":"something","label":"Row 1","key":"row_1"},{"label":"Row 2","message":"message_1","key":"row_2"}]
In essence, I have many rows of JSON that contain label and key, and bunch of other parameters like that. From every {}, I want to extract only label and key, and then (optional, but ideally) stretch every label and key in every {} to its own row. So, as a result, I would have the following output:
Row Label Key
1 Label 1 label_1
1 Label 2 label_2
2 Row 1 row_1
2 Row 2 row_2
Please note, contents of label and key within JSON can be anything (strings, integers, special characters, a mix of everything, etc. In addition, key and label can be anywhere in relation to other parameters within each {}.
Here is the Big Query SQL dummy data for convenience:
SELECT '1' AS Row, '[{"label":"Label 1","key":"label_1","description":"Value of label_1"},{"label":"Label 2","key":"label_2","error":"Something"}]' AS Fields
UNION ALL
SELECT '2' AS Row, '[{"description":"something","label":"Row 1","key":"row_1"},{"label":"Row 2","message":"message_1","key":"row_2"}]' AS Fields
I have first thought of using REGEX to isolate all the brackets and only show me information with label and key. Then, I looked into BQ Documentation of JSON functions and got very stuck on json_path parameters, specifically because their example doesn't match mine.
Consider below approach
select `row`,
json_extract_scalar(el, '$.label') label,
json_extract_scalar(el, '$.key') key
from your_table, unnest(json_extract_array(fields)) el
if applied to sample data in your question - output is

ImportJSON for Google Sheets Can't Handle File WIthout Properties?

I'm trying to pull historical pricing data from CoinGecko's free API to use in a Google Sheet. It presents OHLC numbers in the following format:
[
[1589155200000,0.05129,0.05129,0.047632,0.047632],
[1589500800000,0.047784,0.052329,0.047784,0.052329],
[1589846400000,0.049656,0.053302,0.049656,0.053302],
...
]
As you can see, this isn't typical JSON format since there are no property names. So that everyone is on the same page, for this data the properties of each subarray in order are Time (in UNIX epoch format), Open Price, High Price, Low Price, and Close Price.
I'm using the ImportJSON code found here to try and pull this data, but it does not work. Instead of putting each subarray into a separate row, split into columns for the 5 properties, it prints everything out into a single cell like so:
1589155200000,0.05129,0.05129,0.047632,0.047632,1589500800000,0.047784,0.052329,0.047784,0.052329,15898 6400000,0.049656,0.053302,0.049656,0.053302,...
This is incredibly unhelpful. I'm trying to avoid using a paid API add-on since I really don't want to have to pay the frankly exorbitant fees they want to charge, but I can't figure out a way to get ImportJSON to play nicely with this data. Does anyone know of a solution?
It's simplier : your datas are in an array structure : I put
[
[1589155200000,0.05129,0.05129,0.047632,0.047632],
[1589500800000,0.047784,0.052329,0.047784,0.052329],
[1589846400000,0.049656,0.053302,0.049656,0.053302]
]
in A1, and I get the individual values by this simplier way :
function myArray(){
var f = SpreadsheetApp.getActiveSheet();
var result = eval(f.getRange('A1').getValue());
f.getRange(2,1,result.length,result[0].length).setValues(result)
}

How to get the bounding coordinates for a US postal(zip) code?

Is there a service/API that will take a postal/zip code and return the bounding(perimeter) coordinates so I can build a Geometry object in a MS SQL database?
By bounding coordinates, I mean I would like to retrieve a list of GPS coordinates that construct a polygon that defines the US zip code.
An elaboration of my comment, that ZIP codes are not polygons....
We often think of ZIP codes as areas (polygons) because we say, "Oh, I live in this ZIP code..." which gives the impression of a containing region, and maybe the fact that ZIP stands for "Zone Improvement Plan" helps the false association with polygons.
In actuality, ZIP codes are lines which represent, in a sense, mail carrier routes. Geometrically, lines do not have area. Just as lines are strings of points along a coordinate plane, ZIP code lines are strings of delivery points in the abstract space of USPS-designated addresses.
They are not correlated to geographical coordinates. What you will find, though, is that they appear to be geographically oriented because it would be inefficient for carriers to have a route completely irrelevant of distance and location.
What is this "abstract space of USPS-designated addresses"? That's how I am describing the large and mysterious database of deliverable locations maintained by the US Postal Service. Addresses are not allotted based on geography, but on the routes that carriers travel which usually relates to streets and travelability.
Some 5-digit ZIP codes are only a single building, or a complex of buildings, or even a single floor of a building (yes, multiple zip codes can be at a single coordinate because their delivery points are layered vertically). Some of these -- among others -- are "unique" ZIPs. Companies and universities frequently get their own ZIP codes for marketing or organizational purposes. For instance, the ZIP code "12345" belongs to General Electric up in Schenectady, NY. (Edit: In a previous version of Google Maps, when you follow that link, you'd notice that the placement marker was hovering, because it points to a ZIP code, which is not a coordinate. While most US ZIP codes used to show a region on Google Maps, these types cannot because the USPS does not "own" them, so to speak, and they have no area.)
Just for fun, let's try verifying an address in a unique ZIP code. Head over to SmartyStreets and punch in a bogus address in 12345, like:
Street: 999 Sdf sdf
ZIP Code: 12345
When you try to verify that, notice that... it's VALID! Why? The USPS will deliver a piece to the receptacle for that unique ZIP code, but at that point, it's up to GE to distribute it. Pretty much anything internal to the ZIP code is irrelevant to the USPS, including the street address (technically "delivery line 1"). Many universities function in a similar manner. Here's more information regarding that.
Now, try the same bogus address, but without a ZIP code, and instead do the city/state:
Street: 999 Sdf sdf
City: Schenectady
State: NY
It doesn't validate. This is because even though Schenectady contains 12345, where the address is "valid," it geometrically intersects with the "real" ZIP codes for Schenectady.
Take another instance: military. Certain naval ships have their own ZIP codes. Military addresses are an entirely different class of addresses using the same namespace. Ships move. Geographical coordinates don't.
ZIP precision is another fun one. 5-digit ZIP codes are the least "precise" (though the term "specific" might be more meaningful here, since ZIP codes don't pinpoint anything). 7- and 9-digit ZIP codes are the most specific, often down to block or neighborhood-level in urban areas. But since each ZIP code is a different size, it's really hard to tell what actual distances you're talking.
A 9-digit ZIP code might be portioned to a floor of a building, so there you have overlapping ZIP codes for potentially hundreds of addresses.
Bottom line: ZIP codes don't, contrary to popular belief, provide geographical or boundary data. They vary widely and are actually quite un-helpful unless you're delivering mail or packages... but the USPS' job was to design efficient carrier routes, not partition the population into coordinate regions so much.
That's more the job of the census bureau. They've compiled a list of cartographic boundaries since ZIP codes are "convenient" to work with. To do this, they sectioned bunches of addresses into census blocks. Then, they aggregated USPS ZIP code data to find the relation between their census blocks (which has some rough coordinate data) and the ZIP codes. Thus, we have approximations of what it would look like to plot a line as a polygon. (Apparently, they converted a 1D line into a 2D polygon by transforming a 2D polygon based on its contents to fit linear data -- for each non-unique, regular ZIP code.)
From their website (link above):
A ZIP Code tabulation area (ZCTA) is a statistical geographic entity
that approximates the delivery area for a U.S. Postal Service
five-digit or three-digit ZIP Code. ZCTAs are aggregations of census
blocks that have the same predominant ZIP Code associated with the
addresses in the U.S. Census Bureau's Master Address File (MAF).
Three-digit ZCTA codes are applied to large contiguous areas for which
the U.S. Census Bureau does not have five-digit ZIP Code information
in its MAF. ZCTAs do not precisely depict ZIP Code delivery areas, and
do not include all ZIP Codes used for mail delivery. The U.S. Census
Bureau has established ZCTAs as a new geographic entity similar to,
but replacing, data tabulations for ZIP Codes undertaken in
conjunction with the 1990 and earlier censuses.
The USCB's dataset is incomplete, and at times inaccurate. Google still has holes in their data, too (the 12345 is a somewhat good example) -- but Google will patch it eventually by going over each address and ZIP code by hand. They do this already, but haven't made all their map data perfect quite yet. Naturally, access to this data is limited to API terms, and it's very expensive to raise these.
Phew. I'm beat. I hope that helps clarify things. Disclaimer: I used to be a developer at SmartyStreets. More information on geocoding with address data.
Even more information about ZIP codes.
What you are asking for is a service to provide "Free Zip code Geocoding". There are a few out there with varying quality. You're going to have a bad time coding something like this yourself because of a few reasons:
Zip codes can be assigned to a single building or to a post office.
Zip codes are NOT considered a polygonal area. Projecting Zip codes to a polygonal area will require you to make an educated guess where the boundary is between one zipcode and the next.
ZIP code address data specifies only a center location for the ZIP code. Zip code data provides the general vicinity of an address. Mailing addresses that exist between one zipcode and another can be in dispute on which zipcode it actually is in.
A mailing address may be physically closer to zipcode 11111, yet its official zip code is a more distant zip code point 11112.
Google Maps has a geocoding API:
The google maps API is client-side javascript. You can directly query the geocoding system from php using an http request. However, google maps only gives you what the United States Postal Service gives them. A point representing the center of the zipcode.
https://developers.google.com/maps/#Geocoding_Examples
map city/zipcode polygons using google maps
Thoughts on projecting a zipcode to its lat/long bounding box
There are approximately 43,000 ZIP Codes in the United States. This number fluctuates from month to month, depending on the number of changes made. The zipcodes used by the USPS are not represented as polygons and do not have hard and fast boundaries.
The USPS (United States Postal Service) is the authority that defines each zipcode lat/long. Any software which resolves a zipcode to a geographical location would be in need of weekly updates. One company called alignstar provides demographics and GIS data of zipcodes ( http://www.alignstar.com/data.html ).
Given a physical (mailing) address, find the geographical coordinates in order to display that location on a map.
If you want to reliably project what shape the zipcode is in, you are going to need to brute force it and ask: "give me every street address by zipcode", then paint boxes around those mis-shapen blobs. Then you can get a general feel for what geographical areas the zipcodes cover.
http://vterrain.org/Culture/geocoding.html
If you were to throw millions of mailing address points into an algorithm resolving every one to a lat/long, you might be able to build a rudimentary blob bounding box of that zipcode. You would have to re-run this algorithm and it would theoretically heal itself whenever the zipcode numbers move.
Other ideas
http://shop.delorme.com/OA_HTML/DELibeCCtpSctDspRte.jsp?section=10075
http://www.zip-codes.com/zip-code-map-boundary-data.asp
step 1:download cb_2018_us_zcta510_500k.zip
https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html
if you want to keep them in mysql
step 2: in your mysql create a db of named :spatialdata
run this command
ogr2ogr -f "MySQL" MYSQL:"spatialdata,host=localhost,user=root" -nln "map" -a_srs "EPSG:4683" cb_2018_us_zcta510_500k.shp -overwrite -addfields -fieldTypeToString All -lco ENGINE=MyISAM
i uploaded the file on github(https://github.com/sahilkashyap64/USA-zipcode-boundary/blob/master/USAspatialdata.zip)
In the your "spatialdata db" there will be 2 table named map & geometry_columns .
In 'map' there will be a column named "shape".
shape column is of type "geometry" and it contains polygon/multipolygon files
In 'geometry_columns' there will will be srid defined
how to check if point falls in the polygon
SELECT * FROM map WHERE ST_Contains( map.SHAPE, ST_GeomFromText( 'POINT(63.39550 -148.89730 )', 4683 ) )
and want to show boundary on a map
select zcta5ce10 as zipcode, ST_AsGeoJSON(SHAPE) sh from map where ST_Contains( map.SHAPE, ST_GeomFromText( 'POINT(34.1116 -85.6092 )', 4683 ) )
"ST_AsGeoJSON" this returns spatial data as geojson.
Use http://geojson.tools/
"HERE maps" to check the shape of geojson
if you want to generate topojson
mapshaper converts shapefile to topojson (no need to convert it to kml file)
npx -p mapshaper mapshaper-xl cb_2018_us_zcta510_500k.shp snap -simplify 0.1% -filter-fields ZCTA5CE10 -rename-fields zip=ZCTA5CE10 -o format=topojson cb_2018_us_zcta510_500k.json
If you want to convert shapefile to kml
`ogr2ogr -f KML tl_2019_us_zcta510.kml -mapFieldType Integer64=Real tl_2019_us_zcta510.shp
I have used mapbox gl to display 2 zipcodes
example: https://sahilkashyap64.github.io/USA-zipcode-boundary/
code :https://github.com/sahilkashyap64/USA-zipcode-boundary
SQL Server Solution
Download the Shape files from the US Census:
https://catalog.data.gov/dataset/2019-cartographic-boundary-shapefile-2010-zip-code-tabulation-areas-for-united-states-1-500000
I then found this repository to import the shape file to SQL Server, it was very fast and required no additional coding: https://github.com/xfischer/Shape2SqlServer
Then I could write my own script to find out which zip codes are in a polygon I created:
DECLARE #polygon GEOMETRY;
DECLARE #isValid bit = 0;
DECLARE #p nvarchar(2048) = 'POLYGON((-120.1547 39.2472,-120.3758 39.1950,-120.2124 38.7734,-119.6590 38.8162,-119.6342 39.3672,-120.1836 39.2525,-120.1547 39.2472))'
SET #polygon = GEOMETRY::STPolyFromText(#p,4326)
SET #isValid = #polygon.STIsValid()
IF (#isValid = 1)
SET #polygon = #polygon.MakeValid();
SET #isValid = #polygon.STIsValid()
IF (#isValid = 1)
BEGIN
SELECT * FROM cb_2019_us_zcta510_500k
WHERE geom.STIntersects(#polygon) = 1
END
ELSE
SELECT 'Polygon not valid'
I think this is what you need it uses US Census as repository: US Zipcode
Boundaries API: https://www.boundaries-io.com
Above API shows US Boundaries(GeoJson) by zipcode,city, and state. you should use the API programatically to handle large results.
Disclaimer,I work here
I think the world geoJson link and the google map geocode api can help you.
example: you can use the geocode api to code the zip,you will get the city,state,country,then,you search from the world and us geoJson get the boundry,I have an example of US State boundry,like dsdlink

Resources