We are beginning to place a fair amount of our spatial data in Snowflake. Linear data that has M - measure data does not appear to be supported in Snowflake, even though it is part of the OGC spec. When we attempt to import data with M (e.g. (X Y Z M)) values Snowflake currently throws an error. Has anyone had experience with this, or know of a useful workaround? We really don't want to lose our measure values.
Related
I have a constant number of columns - they correspond to real-time coordinates of a few/maybe even a few hundred points in space (constant id and x, y coordinates of a detected pose in an OpenCV image - they are analyzed grid by grid so a lot of data comes in at once).
I read that Redis runs on RAM and you can set a time to delete the data.
Cassandra stores data in columns next to each other so as for fixed coordinates it should be suitable.
It would be nice if you could perform operations on them such as subtraction or multiplication.
I'm looking for a database that will be able to quickly write and read this data and at the same time will not be performance-intensive.
thanks
My application consumes data from different sources which should be stored in a DB for reporting purposes. The fields are the same and the data is oly about a half-dozen fields but the data is sometimes reported in different units.
e.g. To get speed in m/s
from TypeA, speed = (reported value - 1024) * 0.1
from typeB, speed = (reported value) * 0.001
I a considering whether to hard-code this conversion in the application logic and write the calculated values into the DB, or to write the raw values received into the DB and use views to do a union of TypeA records with TypeB records, adding simple calculations this way, or to rely on it being built into reporting queries to do the same thing.
The core DB table might grow by as much as a million rows per day, though the data is simple, so my initial thought was to keep the DB as simple as possible, more along a data warehouse principle.
But as a coder rather than a DBA I have no real idea if this is so trivial a good DB will not have any problems.
I've been using Lucene to great effect to provide a solution where my users can query a lot of records (100 million+) very quickly. Users have a large form with a lot of different fields they can choose from. They also have an "advanced search" option where they can construct their own queries which support nested logic with AND, OR and NOT operators.
I use MSSQL as my main data store and then I index the data in Lucene. A Lucene query returns me a list of IDs that I then query directly from the MSSQL database, thus avoiding complicated (slow) query plans that would be the result of trying the equivalent query directly against the database. With a bit of planning and design, Lucene has shown itself to be highly capable of performing very fast queries where the query has a significant amount of complexity e.g. ((A AND B) OR (B AND C AND D)) OR (A[X TO Y] AND K) OR (Q,W,E,R,T,Y,U,I,O). You get the picture.
The problem I have run into is a relational one. When a record has related attributes K, each of which have their own attributes J, and a user tries to perform a search specifying multiple conditions of J against a single K and more than one of those conditions is numerical in nature, suddenly the need for a relational store becomes apparent as there isn't really an effective way to tokenize the relationship between one numerical attribute and another.
Obviously there are some great solutions out there for storing huge amounts of data and still being fast to query at a basic level. What I want to know is if you have any recommendations as to which of these solutions is also capable of performing very fast lookups when the query often has a certain level of complexity as described earlier.
As best I can tell, there's no really good unified solution for this. My solution is:
MongoDB for big data storage and fast key-based lookups
Lucene for super fast, complex queries
In my index I store document IDs that I then retrieve from the database as needed.
I have a real estate application and a "house" contains the following information:
house:
- house_id
- address
- city
- state
- zip
- price
- sqft
- bedrooms
- bathrooms
- geo_latitude
- geo_longitude
I need to perform an EXTREMELY fast (low latency) retrieval of all homes within a geo-coordinate box.
Something like the SQL below (if I were to use a database):
SELECT * from houses
WHERE latitude IS BETWEEN xxx AND yyy
AND longitude IS BETWEEN www AND zzz
Question: What would be the quickest way for me to store this information so that I can perform the fastest retrieval of data based on latitude & longitude? (e.g. database, NoSQL, memcache, etc)?
This is a typical query for a Geographical Information System (GIS) application. Many of these are solved by using quad-tree, or similar spatial, indices. The tiling mentioned is how these often end up being implemented.
If an index containing the coordinates could fit into memory and the DBMS had a decent optimiser, then a table scan could provide a Cartesian distance from any point of interest with tolerably low overhead. If this is too slow, then the query could be pre-filtered by comparing each coordinate axis separately before doing the full distance calculation.
ThereMongoDB supports geospatial indexes, but there are ways to reduce the computation time for things like this. Depending on how your data is arranged, you can place houses in identifiable 'tiles' and then fetch all houses for a given tile and, from that reduced dataset, sort based on distance from whatever coordinates you have.
Depending on how many tiles there are, you can use bitmasks to find houses that may be near or overlap multiple tiles.
I'm going to assume that you're doing lots more reads than writes, and you don't need to have your database distributed across dozens of machines. If so, you should go for a read-optimized database like sqlite (my personal preference) or mysql, and use exactly the SQL query you suggest.
Most (not all) NoSQL databases end up being overly complicated for queries of this sort, since they're better at looking up exact values in their indexes rather than ranges.
It's nice that you're looking for a bounding box instead of cartesian distance; the latter would be harder for a SQL database to optimize (although you could narrow it to a bounding box, then do the slower cartesian distance calculation).
I am designing a new laboratory database. For some tests, I have several waveforms with ~10,000 data points acquired simultaneously. In the application (written in C), the waveforms are stored as an array of floats.
I believe I would like to store each waveform as a BLOB.
Questions:
Can the data in a BLOB be structured in such a way that Oracle can work with the data itself using only SQL or PL/SQL?
Determine max, min, average, etc
Retrieve index when value first exceeds 500
Retrieve 400th number
Create BLOB which is a derivative of first BLOB
NOTE: This message is a sub-question of Storing Waveforms in Oracle.
Determine max, min, average, etc
Retrieve index when value first
exceeds 500
Retrieve 400th number
The relational data model was designed for this kind of analysis - and Oracle's SQL is more than capable of doing this, if you model your data correctly. I recommend you focus on transforming the array of floats into tables of numbers - I suspect you'll find that the time taken will be more than compensated for by the speed of performing these sorts of queries in SQL.
The alternative is to try to write SQL that will effectively do this transformation at runtime anyway - every time the SQL is run; which will probably be much less efficient.
You may also wish to consider the VARRAY type. You do have to work with the entire array (no retreival of subsets, partial updates, etc.) but you can define a max length and Oracle will store only what you use. You can declare VARRAYs of most any datatype, including BINARY_FLOAT or NUMBER. BINARY_FLOAT will minimize your storage, but suffers from some minor precision issues (although important in financial applications). It is in IEEE 754 format.
Since you're planning to manipulate the data with PL/SQL I might back off from the BLOB design. VARRAYs will be more convenient to use. BLOBs would be very convenient to store an array of raw C floats for later use in another C program.
See PL/SQL Users Guide and Reference for how to use them.
I think that you could probably create PL/SQL functions that take the blob as a parameter and return information on it.
If you could use XMLType for the field, then you can definitely parse in PL/SQL and write the functions you want.
http://www.filibeto.org/sun/lib/nonsun/oracle/11.1.0.6.0/B28359_01/appdev.111/b28369/xdb10pls.htm
Of course, XML will be quite a bit slower, but if you can't parse the binary data, it's an alternative.