I'm trying to pull historical pricing data from CoinGecko's free API to use in a Google Sheet. It presents OHLC numbers in the following format:
[
[1589155200000,0.05129,0.05129,0.047632,0.047632],
[1589500800000,0.047784,0.052329,0.047784,0.052329],
[1589846400000,0.049656,0.053302,0.049656,0.053302],
...
]
As you can see, this isn't typical JSON format since there are no property names. So that everyone is on the same page, for this data the properties of each subarray in order are Time (in UNIX epoch format), Open Price, High Price, Low Price, and Close Price.
I'm using the ImportJSON code found here to try and pull this data, but it does not work. Instead of putting each subarray into a separate row, split into columns for the 5 properties, it prints everything out into a single cell like so:
1589155200000,0.05129,0.05129,0.047632,0.047632,1589500800000,0.047784,0.052329,0.047784,0.052329,15898 6400000,0.049656,0.053302,0.049656,0.053302,...
This is incredibly unhelpful. I'm trying to avoid using a paid API add-on since I really don't want to have to pay the frankly exorbitant fees they want to charge, but I can't figure out a way to get ImportJSON to play nicely with this data. Does anyone know of a solution?
It's simplier : your datas are in an array structure : I put
[
[1589155200000,0.05129,0.05129,0.047632,0.047632],
[1589500800000,0.047784,0.052329,0.047784,0.052329],
[1589846400000,0.049656,0.053302,0.049656,0.053302]
]
in A1, and I get the individual values by this simplier way :
function myArray(){
var f = SpreadsheetApp.getActiveSheet();
var result = eval(f.getRange('A1').getValue());
f.getRange(2,1,result.length,result[0].length).setValues(result)
}
Related
Bitrix only brings up to 50 positions in the array list of all my data, this happens because of a security matter and, for what I've seen, can't be changed. So, to export the data from Bitrix to my dataware house (currently going through tranformation inside Pentaho) i need to get something near 50.000 arrays (keep in mind I can only get 50 at a time!!!! + they come at a non-organized id order) I need help to filter the ids in a order so my requests become easier. These are my parameters:
Print
If anyone know what kind of selection or filter I could use I'd really appreciate it!
TL;DR: MapReduce or POST request?
What is the correct(=most efficient) way to fetch the latest n data points of multiple sensors, from Cloudant or equivalent database?
Sensor data is stored in individual documents like this:
{
"_id": "2d26dbd8e655ae02bdab611afc92b6cf",
"_rev": "1-a64448521f05935b915e4bee12328e84",
"date": "2017-06-20T15:59:50.509Z",
"name": "Sensor01",
"temperature": 24.5,
"humidity": 45.3,
"rssi": -33
}
I want the fetch the latest 10 documents from sensor01-sensor99 so I can feed it to UI.
I have discovered few options:
1. Use map reduce function
Reduce each sensor data to array under sensor01, sensor02, etc...
E.g.
Map:
function (doc) {
if (doc.name && doc.temperature) emit(doc.name, doc.temperature);
}
Reduce:
function (keys, values, rereduce) {
var temp_arr=[];
for (i=0;i<values.length;i++)
{
temp_arr.push(values);
}
return temp_arr;
}
I couldn't get this to work, but I think the method should be viable.
2. Multi-document fetching
{
"queries":[
{sensor01},{sensor02},{sensor03} etc....
]};
Where each {sensor0x} is filtered using
{"startkey": [sensors[i],{}],"endkey": [sensors[i]],"limit": 5}
This way I can order documents using ?descending=true
I implemented it and it works. I have my doubts should I use this if I have 1000 sensors with 10000 data points each.
And for hundreds of sensors I need to send a very large POST request.
Something better?
Is my architecture even correct?
Storing sensor data individual documents, and then fill the UI by fetching all data through REST API.
Thank you very much!
There's nothing wrong with your method of storing one reading per document, but there's no truly efficient way of getting "the last n data points" for a number of sensors.
We could create a MapReduce function:
function (doc) {
if (doc.name && doc.temperature && doc.date) {
emit([doc.name, doc.date], doc.temperature);
}
}
This creates an indexed ordered on name and date.
We can access the most recent readings for a single sensor by querying the view:
/_design/mydesigndoc/_view/myview?startkey=["Sensor01","2020-01-01"]&descending=true&limit=10
This fetches readings for "Sensor01" in newest-first order:
startkey & endkey are reveresed when doing descending=true
descending= true means in reverese order
limit - the number of readings required (or n in your parlance)
This is a very efficient use of Cloudant/CouchDB but it only returns the last n readings for single sensor. To retrieve other sensors' data, additional API calls would be required.
Creating an index like this:
function (doc) {
if (doc.name && doc.temperature && doc.date) {
emit(doc.date, doc.temperature);
}
}
orders each reading by date. You can then retrieve the newest n readings with:
/_design/mydesigndoc/_view/myview?startkey="2020-01-01"&descending=true&limit=200
If all of your sensors are saving data at the same rate, then simply using a larger limit should get your the latest readings of all sensors.
This too is an efficient use of CouchDB/Cloudant.
You may also want to look at the built-in reducers (_count, _sum and _stats) to get the database to aggregate readings for you. They are a great way to create year/month/day groupings of IoT data.
In general, I would recommend not using custom reducers they are many times more inefficient than the built-in reducers which are written in Erlang.
I have a mongo collection with documents that have a schema structured like the following:
{ _id : bla,
fname : foo,
lname : bar,
subdocs [ { subdocname : doc1
field1 : one
field2 : two
potentially_huge_array : [...]
}, ...
]
}
I'm using the ruby mongo driver that currently does not support elemMatch. I use an aggregation when extracting from subdocs via a project, unwind and match pipeline.
What I would now like to do is to page results from the potentially_huge_array array contained in the subdocument. I have not been able to figure out how to grab just a subset of the array without dragging the entire subdoc, huge array and all, out of the db into my app.
Is there some way to do this?
Would a different schema be a better way to handle this?
Depending on how huge is huge, you definitely don't want it embedded into another document.
The main reason is that unless you always want the array returned with the document, you probably don't want to store it as part of the document. How you can store it in another collection would depend on exactly how you want to access it.
Reviewing the types of queries you most often perform on your data will usually suggest the best schema - one that will allow you to be efficient about number of queries, the amount of data returned and ease of indexing the data.
If you field really huge and changes often, just placed it in separate collection.
I have a table called devicesegments, each row of which contains a large array called devices. Owing to the size of the device array, I've been asked not to include it in my query for a page that lists all devicesegments, but only include their count. Is this possible?
What I was doing before :
A simple db.devicesegments.find()
What I'm doing now :
db.devicesegments.find({}, { devices : 0 })
What I want to achieve :
db.devicesegments.find({}, { devices : 0, devices.length : 1 })
Something like a COUNT(devices) AS device_count!
Ashkay, there's no way to do this with Mongo currently. As #rompetroll says, your application should keep a "count" field on each document, and carefully $inc it whenever you change the number of entries in the array. Then when you query for the document, exclude the array like:
db.collection.find({}, {devices:0})
If you're willing to run MongoDB 2.1, which is a development release, the aggregation framework provides a means to calculate the array size within a query:
http://www.mongodb.org/display/DOCS/Aggregation+Framework
Since there is no way to currently do this, without including a new device_count in my table, the temporary fix that I applied was to fetch all the data from the database, along with the devices array, and for each row, add a field for devices.length and then remove the devices array before sending the data across.
I've been using SQL Server to store historical time series data for a couple hundred thousand objects, observed about 100 times per day. I'm finding that queries (give me all values for object XYZ between time t1 and time t2) are too slow (for my needs, slow is more then a second). I'm indexing by timestamp and object ID.
I've entertained the thought of using somethings a key-value store like MongoDB instead, but I'm not sure if this is an "appropriate" use of this sort of thing, and I couldn't find any mentions of using such a database for time series data. ideally, I'd be able to do the following queries:
retrieve all the data for object XYZ between time t1 and time t2
do the above, but return one date point per day (first, last, closed to time t...)
retrieve all data for all objects for a particular timestamp
the data should be ordered, and ideally it should be fast to write new data as well as update existing data.
it seems like my desire to query by object ID as well as by timestamp might necessitate having two copies of the database indexed in different ways to get optimal performance...anyone have any experience building a system like this, with a key-value store, or HDF5, or something else? or is this totally doable in SQL Server and I'm just not doing it right?
It sounds like MongoDB would be a very good fit. Updates and inserts are super fast, so you might want to create a document for every event, such as:
{
object: XYZ,
ts : new Date()
}
Then you can index the ts field and queries will also be fast. (By the way, you can create multiple indexes on a single database.)
How to do your three queries:
retrieve all the data for object XYZ
between time t1 and time t2
db.data.find({object : XYZ, ts : {$gt : t1, $lt : t2}})
do the above, but return one date
point per day (first, last, closed to
time t...)
// first
db.data.find({object : XYZ, ts : {$gt : new Date(/* start of day */)}}).sort({ts : 1}).limit(1)
// last
db.data.find({object : XYZ, ts : {$lt : new Date(/* end of day */)}}).sort({ts : -1}).limit(1)
For closest to some time, you'd probably need a custom JavaScript function, but it's doable.
retrieve all data for all objects for
a particular timestamp
db.data.find({ts : timestamp})
Feel free to ask on the user list if you have any questions, someone else might be able to think of an easier way of getting closest-to-a-time events.
This is why databases specific to time series data exist - relational databases simply aren't fast enough for large time series.
I've used Fame quite a lot at investment banks. It's very fast but I imagine very expensive. However if your application requires the speed it might be worth looking it.
There is an open source timeseries database under active development (.NET only for now) that I wrote. It can store massive amounts (terrabytes) of uniform data in a "binary flat file" fashion. All usage is stream-oriented (forward or reverse). We actively use it for the stock ticks storage and analysis at our company.
I am not sure this will be exactly what you need, but it will allow you to get the first two points - get values from t1 to t2 for any series (one series per file) or just take one data point.
https://code.google.com/p/timeseriesdb/
// Create a new file for MyStruct data.
// Use BinCompressedFile<,> for compressed storage of deltas
using (var file = new BinSeriesFile<UtcDateTime, MyStruct>("data.bts"))
{
file.UniqueIndexes = true; // enforces index uniqueness
file.InitializeNewFile(); // create file and write header
file.AppendData(data); // append data (stream of ArraySegment<>)
}
// Read needed data.
using (var file = (IEnumerableFeed<UtcDateTime, MyStrut>) BinaryFile.Open("data.bts", false))
{
// Enumerate one item at a time maxitum 10 items starting at 2011-1-1
// (can also get one segment at a time with StreamSegments)
foreach (var val in file.Stream(new UtcDateTime(2011,1,1), maxItemCount = 10)
Console.WriteLine(val);
}
I recently tried something similar in F#. I started with the 1 minute bar format for the symbol in question in a Space delimited file which has roughly 80,000 1 minute bar readings. The code to load and parse from disk was under 1ms. The code to calculate a 100 minute SMA for every period in the file was 530ms. I can pull any slice I want from the SMA sequence once calculated in under 1ms. I am just learning F# so there are probably ways to optimize. Note this was after multiple test runs so it was already in the windows Cache but even when loaded from disk it never adds more than 15ms to the load.
date,time,open,high,low,close,volume
01/03/2011,08:00:00,94.38,94.38,93.66,93.66,3800
To reduce the recalculation time I save the entire calculated indicator sequence to disk in a single file with \n delimiter and it generally takes less than 0.5ms to load and parse when in the windows file cache. Simple iteration across the full time series data to return the set of records inside a date range in a sub 3ms operation with a full year of 1 minute bars. I also keep the daily bars in a separate file which loads even faster because of the lower data volumes.
I use the .net4 System.Runtime.Caching layer to cache the serialized representation of the pre-calculated series and with a couple gig's of RAM dedicated to cache I get nearly a 100% cache hit rate so my access to any pre-computed indicator set for any symbol generally runs under 1ms.
Pulling any slice of data I want from the indicator is typically less than 1ms so advanced queries simply do not make sense. Using this strategy I could easily load 10 years of 1 minute bar in less than 20ms.
// Parse a \n delimited file into RAM then
// then split each line on space to into a
// array of tokens. Return the entire array
// as string[][]
let readSpaceDelimFile fname =
System.IO.File.ReadAllLines(fname)
|> Array.map (fun line -> line.Split [|' '|])
// Based on a two dimensional array
// pull out a single column for bar
// close and convert every value
// for every row to a float
// and return the array of floats.
let GetArrClose(tarr : string[][]) =
[| for aLine in tarr do
//printfn "aLine=%A" aLine
let closep = float(aLine.[5])
yield closep
|]
I use HDF5 as my time series repository. It has a number of effective and fast compression styles which can be mixed and matched. It can be used with a number of different programming languages.
I use boost::date_time for the timestamp field.
In the financial realm, I then create specific data structures for each of bars, ticks, trades, quotes, ...
I created a number of custom iterators and used standard template library features to be able to efficiently search for specific values or ranges of time-based records.