I'm creating a spatial table in PostGIS. My geometries are going to be abstract, not related to a real world. So I need some fake SRID. What is the right way to do it?
If your coordinates are not part of any spatial reference system, use SRID=0.
(Or if you have older PostGIS versions, this is SRID=-1)
Related
Imagine an enormous collection of locations of interest. And given any point on the map we would like to list all such locations within say 5 km of it.
This seems like a reasonably simple idea that I expect there is already a thought out solution. But I don't know how to Google for it.
How would the location data be stored in a database to make searching fast. I'm assuming that a SQL database (which is based around relational tabular data) will not work since I don't see an obvious way to use SQL's tabular nature to filter out most location further than 5 km away to keep each query fast.
Maybe databases like Postgres have some kind of spatial extensions that allow what I am asking to be done fast. If so how is such a thing implemented.
And if one were implementing a database from scratch for spatial queries like mine how would they be implemented
Spatial extension for postgres is called PostGIS. It has special data types to represent maps and locations. Also it has special indexes to speed up the queries on spatial data (GIN and GiST indexes).
Here is a list of PostGIS Frequently Asked Questions. It has an answer for your question.
http://postgis.net/docs/manual-2.1/PostGIS_FAQ.html
I am trying to understand the structure of PostGIS, but the more I read the more confused I get. For starters, I am a complete newbie in geospatial databases. I would like to understand what is the architecture of such database, what are the most commonly used designs for such databases, the design implications. The real basic stuff, as I have literally no clue about such things. Can some one point me to some direction or such material?
Since I was looking into PostGIS I also came across libraries called GEOS(Geometry engine) and Proj.4. I know what PostGIS use them for but I am not sure if I understand it correctly. As far as I know GEOS provides the geometric data types and way of indexing them as well as to query the data. The calculations on geometric data types are based on planes and are usually fast. They also have geography data type, the calculations on these are based on spheroid and are way slower than the geometric counterpart. Now comes the various projections. I don't understand what they are for? I believe the projection convert the globe into a 2D plane so that geometric calculations can be used on them instead of geographic calculations. as geographic calculations are more expensive, thus projection is a probably a necessary evil. If so, then are the calculations accurate? I don't know if this is a correct understanding of the concepts. It would really help if someone could direct me towards any valuable material to understand geospatial database design and PostGIS and the concerned libraries.
Thank You!
ps - I am not looking on how to use PostGIS, I want to know more about how it is implemented, what was the thought process behind implementation of certain features, using these libraries etc etc. Something along these lines. Using PostGIS seems simple enough, I am not interested in that. :)
I have been playing around with using graphs to analyze big data. Its been working great and really fun but I'm wondering what to do as the data gets bigger and bigger?
Let me know if there's any other solution but I thought of trying Hbase because it scales horizontally and I can get hadoop to run analytics on the graph(most of my code is already written in java), but I'm unsure how to structure a graph on a nosql database? I know each node can be an entry in the database but I'm not sure how to model edges and add properties to them(like name of nodes, attributes, pagerank, weights on edges,etc..).
Seeing how hbase/hadoop is modeled after big tables and map reduce I suspect there is a way to do this but not sure how. Any suggestions?
Also, does this make sense what I'm trying to do? or is it there better solutions for big data graphs?
You can store an adjacency list in HBase/Accumulo in a column oriented fashion. I'm more familiar with Accumulo (HBase terminology might be slightly different) so you might use a schema similar to:
SrcNode(RowKey) EdgeType(CF):DestNode(CFQ) Edge/Node Properties(Value)
Where CF=ColumnFamily and CFQ=ColumnFamilyQualifier
You might also store node/vertex properties as separate rows using something like:
Node(RowKey) PropertyType(CF):PropertyValue(CFQ) PropertyValue(Value)
The PropertyValue could be either in the CFQ or the Value
From a graph processing perspective as mentioned by #Arnon Rotem-Gal-Oz you could look at Apache Giraph which is an implementation of Google Pregel. Pregel is the method Google use for large graph processing.
Using HBase/Accumulo as input to giraph has been submitted recently (7 Mar 2012) as a new feature request to Giraph: HBase/Accumulo Input and Output formats (GIRAPH-153)
You can store the graph in HBase as adjacency list so for example, each raw would have columns for general properties (name, pagerank etc.) and a list of keys of adjacent nodes (if it a directed graph than just the nodes you can get to from this node or an additional column with the direction of each)
Take a look at apache Giraph (you can also read a little more about it here) while this isn't about HBase it is about handling graphs in Hadoop.
Also you may want to look at Hadoop 0.23 (and up) as the YARN engine (aka map/reduce2) is more open to non-map/reduce algorithms
I would not use HBase in the way "Binary Nerd" recommended it as HBase does not perform very well when handling multiple column families.
Best performance is achieved with a single column family (a second one should only be used if you very often only access the content of one column family and the data stored in the other column family is very large)
There are graph databases build on top of HBase you could try and/or study.
Apache S2Graph
provides REST API for storing, querying the graph data represented by edge and vertices. There you can find a presentation, where the construction of row/column keys is explained. Analysis of operations' performance that influenced or is influenced by the design are also given.
Titan
can use other storage backends besides HBase, and has integration with analytics frameworks. It is also designed with big data sets in mind.
We're developing an application that has to query 3D shapes (and query based on other parameters as well) within a bounding box. The number of shapes is more than I want to keep in memory, so I need a database to handle it.
Specifically, our primary operations are inserts and queries. We never modify existing data.
Because it's a desktop application, I'm trying to avoid the PostgreSQL and MySQL separate server types of things, hoping for something more simple for deployment. I found Spatialite but it does not index on the 3rd dimension, so it won't work.
I tried searching for kd-tree database but haven't found anything yet. I know there are kd-tree implementations, but getting it in database form would take a lot of effort to roll our own, so I'm trying to see if there is something already out there.
The application is in Haskell, but if we have to integrate with some other language, we might deal with that.
SQLite R*Trees
Given a query rectangle, an R-Tree is able to quickly find all entries that are contained within the query rectangle or which overlap the query rectangle. This idea is easily extended to three dimensions for use in CAD systems.
I would respectfully challenge your attempt to avoid PostgreSQL/MySQL. I'm experienced in PostgreSQL and it does the job you want and it is not difficult to administer. Certainly, anything else you find is not going to have the level of development and testing PostgreSQL does - so why bother?
Well, let me explain this briefly:
1.I want to build a website that provides location based services, like http://fireeagle.yahoo.net/ .
2.I guess most of these services have something do with longitude and latitude.
3.Is there any particular database/datastore/data structures fit well for such apps? I mean easy to store longitude, latitude and easy to compute or easy to use.
I am new to this and any feedbacks are welcome
Spatial extensions to relational database systems provide storage and indexed access the geography/geometry datatypes. They allow you to perform spatial joins and all sorts of spatial queries. In short, they are exactly what you need.
If you are using the open source stack I would recommend PostGIS, the spatial extension to Postgresql. If you are using the MS stack, try the spatial extensions to SQL Server 2008.
MySQL has a spatial extension with tutorials here. The basic idea of getting fast queries is to design the table with a column with a spatial index, an R-tree index that's fast for range queries such as "give me points near this point."
Of course, there's Postgres with PostGIS and you could pay for this service from companies like SimpleGeo.
I would recommend you to consider GeoDjango
It is very nice, as it merges the simplicity of Python/Django and the power of PostGIS. But it can also be complex and provide too many features, therefore wasting your time.
If you don't have particular needs, there is another simpler solution to be used with Django or Python alone, that is Geopy. While not adding spatial extensions to a database, it allows you to perform Geospatial calculations using generic data structures (also any database). You can calculate distances, doing (reverse) Geocoding. Take a look at the Getting Started page, but also directly at the code, as it is well documented. I'm using it for a Dynamic Carpooling project and it works very well.
Both solutions fit well with the Django framework, so you coud easily develop a website around the services provided.