How to create a GluonTS dataset from Stock/Equity, OHLCV, price data - dataset

I have been struggling with getting a GluonTS dataset from a .csv file populated with OHLCV equity data. Does anyone know the best way to get a dataset that is compatible with GluonTS and HuggingFace from a file like depicted here: enter image description here
into a databaseDict object like depicted here enter image description here
This is the notebook im trying to get my data to be compatible with:
https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/time-series-transformers.ipynb
Towards the end of the notebook it suggest using the following notebook to make your own datasets:
https://github.com/huggingface/notebooks/blob/main/examples/time_series_datasets.ipynb
This is the notebook in which i've gotten the closest to making work, but I cant make it work in a way that would actually make sense.
https://colab.research.google.com/drive/1sQvyZsTSpwhIcmcyg-c3YCfm4qJt6VhZ?usp=sharing
There should be a dataset entry for every day of price data.

Related

Google Sheets Stacked Chart with data structured as a DB table

On the paper it was simple to do, but at the end I'm stucked.
Herebelow the chart I would like to produce (I took an easy example to understand the objective to reach):
On this chart we see that the series for each product are well colored and well structured for each cart.
Now, let's jump on my problem: the data structure.
In my world, data is coming from a database, so the data are structured like that:
I managed to aggregate the values but I loose colors of my series and I need absolutely to identify visually the different product volumes.
Is there any possiblity to reach the first stacked chart layout but with the second dataset format ?
Yes, transfer your data and chart from there (I don't think there's a native solution for this, but happy to learn otherwise).
For example, use =QUERY([data],"select [cart], sum([qty]) group by [cart] pivot [product] label sum([qty]) ''",1) replacing the terms in the brackets with the respective column letters (e.g. B if cart is in column B).

how to visualize geodata (Points) in databricks builtin map?

I simply have a dataframe of "lat" and "lon" coordinates that I would like to visualize on a map.
I am trying to view my data data on the map. But nothing is being displayed.
In my case, the mmsi is the id.
Please note that I have around 2M points to display. Is that possible in databricks?
If there is no way around (plotting 2,000,000 points) in databricks, then what tool can handle large amount of data?
Any help is much appreciated!!
The built in map does not support latitude / longitude data points.
Delivering 2 million data points directly to a browser is problematic. Databricks has a protective limit of 20 MB for HTML display. Your viewers won't be able to take in and visually process that level of detail at any point in time.
I recommend implementing a filter/summarization strategy to enable the display within Databricks notebooks. Some ideas can be taken from this Databricks post: https://databricks.com/blog/2019/12/05/processing-geospatial-data-at-scale-with-databricks.html
Also this Anaconda post gives an excellent survey of data visualization packages, some (like Datashader, Vaex) implement a visual summarization strategies before rendering images: https://www.anaconda.com/blog/python-data-visualization-2018-why-so-many-libraries
The error message clearly says that " Unrecognizable values in the first column. The values should be either country codes in ISO 3166-1 alpha-3 format (e.g. "GBR") or US state abbreviations (e.g. "TX")."
Note: To plot a graph of the world, use country codes in ISO 3166-1 alpha-3 format as the key.
A Map Graph is a way to visualize your data on a map.
Plot Options... was used to configure the graph below.
Keys should contain the field with the location.
Series groupings is always ignored for World Map graphs.
Values should contain exactly one field with a numerical value.
Since there can multiple rows with the same location key, choose "Sum", "Avg", "Min", "Max", "COUNT" as the way to combine the values for a single key.
Different values are denoted by color on the map, and ranges are always spaced evenly.
Reference: Databricks - Charts and Graph
Hope this helps.

Converting sensor tag data in DSX

I'm working on converting the existing recipe for Data Science Experience (DSX) to use data from a connected Sensor Tag device. However the mobile applications for that device send the data as strings rather than numerics - this is causing the DSX recipe that calculates a Z score to choke. The data is coming from a cloudant db used as a histtorian for Watson IoT Platform so I cant simply reformat it there. Is there a simple way to convert the data inside a DSX notebook ?
Just access the row object and convert it:
cloudantdata.rdd.map(lambda row : float(row.temperature)).take(10)
EDIT 30.1.17:
To directly address your question:
df = cloudantdata.selectExpr("timestamp as timestamp", "data.d.objectTemp as temperature").map(lambda row : (row.timestamp,float(row.temperature)))
That way you get a tuple RDD which IMHO anyway is more usable as a RowRDD
I'm not familiar with DSX but you can use node red to parse the information from devices then store it in cloudant db in numeric format

How pull a select html formatted chunk of a google spreadsheet using a URL

I have a tree farm.
I have a Google spreadsheet that has my inventory in the form that I took it.
I have pivot table that summarizes that sheet.
How can I run a query from the Jack Pine description page on my website that pulls the appropriate blob off the pivot table on the spreadsheet?
Here's what I've done so far:
Create a new spreadsheet that does an importrange() from the individual sheet with my pivot table.
Share to the world, published to the web. Using another browser where I am not logged in with my google ID I can see the file, and it is view only.
https://docs.google.com/spreadsheets/d/13pXb7Kek010B6s8Ez3h6yX4qF92MgvV4uMk71dJhe3o/edit#gid=0
I'm basing this on this article: [https://blog.ouseful.info/2009/05/18/using-google-spreadsheets-as-a-databace-with-the-google-visualisation-api-query-language/][1]
Now, in a query (split line for reading convenience)
https://spreadsheets.google.com/d/
13pXb7Kek010B6s8Ez3h6yX4qF92MgvV4uMk71dJhe3o/tq?
tqx=out.html&tq=select+*+where+B+contains+%27Pine,%20Jack%27
And I get the following message:
google.visualization.Query.setResponse({
"version":"0.6","status":"error","errors
[{"reason":"access_denied","message":"Access
denied","detailed_message":"Access denied"}]});
Obviously I'm missing something here. How do I troubleshoot this?
Google has changed something. This answer no longer works
Added Sunday.
The following now will fetch the entire sheet:
https://docs.google.com/spreadsheets/d/
13pXb7Kek010B6s8Ez3h6yX4qF92MgvV4uMk71dJhe3o/
edit?tqx=out.html&tq=select+A,B,C,+where+A+starts+with+%27Pine%27#gid=0
But while it fetches, the select statement returns the entire sheet, or rather the query is ignored.
(I originally had %20's for all the +'s, but Google rewrote them, or my browser does.)
This method
https://docs.google.com/spreadsheets/d/
13pXb7Kek010B6s8Ez3h6yX4qF92MgvV4uMk71dJhe3o/
gviz/tq?tq=select%20A,B,C%20where%20A%20contains%20'Pine'#gid=0
returns a file json.txt. I don't read JSON, but sliding over the brackets and punctuation the content is there.
Note the difference around gviz/tq...
Google rewrites the URL removing tq? from it.
I cannot leave the tqx=out.html in place. I get no JSON file and a 'file unavailable error.'
Turns out what I need is tqx=out:html Colon, not period.
Found the information in a table labeled "Request Format" in the document
https://developers.google.com/chart/interactive/docs/dev/implementing_data_source

How to use Large Object in PostgreSQL to yeild a image field?

How does creating a large object work? Does there need to be a client, because all I am hoping to do is have an image be one column.
I am typing the following commands after creating my table but I just get an error about the path not being correct for the image (even though I have it starting right from the C drive).
CREATE TABLE image (name text,
raster oid);
INSERT INTO image (name, raster)
VALUES ('beautiful image', lo_import('C:Documents/etc/motd'));
I am not running any C code, am I suppose to do that or does this automatically create the object Large Object?
If I am suppose to run some C code where would I do it with respect to PostgreSQL?
Can I do what I want all with PostgreSQL syntax? Is there another way to approach including images as a field?
Any help will be greatly appreciated.
According to PostgreSQL documentation, there's two ways to handle large objects (considering Java JDBC):
To use the bytea data type you should simply use the getBytes(), setBytes(), getBinaryStream(), or setBinaryStream() methods.
and
LargeObject API.
Also, you can covert your image to a base64 string and then insert it directly using, for instance, PgAdmin:
CREATE TABLE image_table (name varchar(255), DATA bytea);
INSERT INTO image_table
VALUES ('my_image.jpg',
decode('paste your byte array string here', 'base64'));
Full sample code here.

Resources