Converting sensor tag data in DSX - cloudant

I'm working on converting the existing recipe for Data Science Experience (DSX) to use data from a connected Sensor Tag device. However the mobile applications for that device send the data as strings rather than numerics - this is causing the DSX recipe that calculates a Z score to choke. The data is coming from a cloudant db used as a histtorian for Watson IoT Platform so I cant simply reformat it there. Is there a simple way to convert the data inside a DSX notebook ?

Just access the row object and convert it:
cloudantdata.rdd.map(lambda row : float(row.temperature)).take(10)
EDIT 30.1.17:
To directly address your question:
df = cloudantdata.selectExpr("timestamp as timestamp", "data.d.objectTemp as temperature").map(lambda row : (row.timestamp,float(row.temperature)))
That way you get a tuple RDD which IMHO anyway is more usable as a RowRDD

I'm not familiar with DSX but you can use node red to parse the information from devices then store it in cloudant db in numeric format

Related

How to create a GluonTS dataset from Stock/Equity, OHLCV, price data

I have been struggling with getting a GluonTS dataset from a .csv file populated with OHLCV equity data. Does anyone know the best way to get a dataset that is compatible with GluonTS and HuggingFace from a file like depicted here: enter image description here
into a databaseDict object like depicted here enter image description here
This is the notebook im trying to get my data to be compatible with:
https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/time-series-transformers.ipynb
Towards the end of the notebook it suggest using the following notebook to make your own datasets:
https://github.com/huggingface/notebooks/blob/main/examples/time_series_datasets.ipynb
This is the notebook in which i've gotten the closest to making work, but I cant make it work in a way that would actually make sense.
https://colab.research.google.com/drive/1sQvyZsTSpwhIcmcyg-c3YCfm4qJt6VhZ?usp=sharing
There should be a dataset entry for every day of price data.

How to Extract .owl and save to mysql

I have a file ontobible.owl. how to extract that file and then save data to mysql (because I want display data from ontobible.owl in website). can anyone help me?
edited:
here is my ontobible.owl file (https://teamtrainit.com/ontobible.owl)
i've try open ontobible.owl with sublime text 3 and contains like this
<Verse rdf:about="http://www.semanticweb.org/budsus/ontologies/2021/7/ontobible#HOS5_2">
<verseID>HOS5_2</verseID>
<verse_text>And the revolters are profound to make slaughter, though I have been a rebuker of them all.</verse_text>
</Verse>
<Verse rdf:about="http://www.semanticweb.org/budsus/ontologies/2021/7/ontobible#2CH2_1">
<hasPerson rdf:resource="http://semanticbible.org/ns/2006/NTNames#god_1324"/>
<hasPerson rdf:resource="http://www.co-ode.org/roberts/family-tree.owl#solomon_2762"/>
<verseID>2CH2_1</verseID>
<verse_text>And Solomon determined to build an house for the name of the LORD, and an house for his kingdom.</verse_text>
</Verse>
how to convert that xml tag to array or json so I cant save it to mysql database
you have several options for extracting data from owl
use owl-api and write java code (i think owl api is accessible in other languages) to extract data and pack it in the format you need. also you can use sparql queries for extracting data via jena api
install protege, open your file in protege and save it in format json-dl. this format is very similar to the regular json and you can easily transform it for your needs
install fuseki server, add your file and using sparql queries extract data from there
i think that the second option is the easiest for start if you don't want to write queries or code and it won't take long

how to visualize geodata (Points) in databricks builtin map?

I simply have a dataframe of "lat" and "lon" coordinates that I would like to visualize on a map.
I am trying to view my data data on the map. But nothing is being displayed.
In my case, the mmsi is the id.
Please note that I have around 2M points to display. Is that possible in databricks?
If there is no way around (plotting 2,000,000 points) in databricks, then what tool can handle large amount of data?
Any help is much appreciated!!
The built in map does not support latitude / longitude data points.
Delivering 2 million data points directly to a browser is problematic. Databricks has a protective limit of 20 MB for HTML display. Your viewers won't be able to take in and visually process that level of detail at any point in time.
I recommend implementing a filter/summarization strategy to enable the display within Databricks notebooks. Some ideas can be taken from this Databricks post: https://databricks.com/blog/2019/12/05/processing-geospatial-data-at-scale-with-databricks.html
Also this Anaconda post gives an excellent survey of data visualization packages, some (like Datashader, Vaex) implement a visual summarization strategies before rendering images: https://www.anaconda.com/blog/python-data-visualization-2018-why-so-many-libraries
The error message clearly says that " Unrecognizable values in the first column. The values should be either country codes in ISO 3166-1 alpha-3 format (e.g. "GBR") or US state abbreviations (e.g. "TX")."
Note: To plot a graph of the world, use country codes in ISO 3166-1 alpha-3 format as the key.
A Map Graph is a way to visualize your data on a map.
Plot Options... was used to configure the graph below.
Keys should contain the field with the location.
Series groupings is always ignored for World Map graphs.
Values should contain exactly one field with a numerical value.
Since there can multiple rows with the same location key, choose "Sum", "Avg", "Min", "Max", "COUNT" as the way to combine the values for a single key.
Different values are denoted by color on the map, and ranges are always spaced evenly.
Reference: Databricks - Charts and Graph
Hope this helps.

How to use Large Object in PostgreSQL to yeild a image field?

How does creating a large object work? Does there need to be a client, because all I am hoping to do is have an image be one column.
I am typing the following commands after creating my table but I just get an error about the path not being correct for the image (even though I have it starting right from the C drive).
CREATE TABLE image (name text,
raster oid);
INSERT INTO image (name, raster)
VALUES ('beautiful image', lo_import('C:Documents/etc/motd'));
I am not running any C code, am I suppose to do that or does this automatically create the object Large Object?
If I am suppose to run some C code where would I do it with respect to PostgreSQL?
Can I do what I want all with PostgreSQL syntax? Is there another way to approach including images as a field?
Any help will be greatly appreciated.
According to PostgreSQL documentation, there's two ways to handle large objects (considering Java JDBC):
To use the bytea data type you should simply use the getBytes(), setBytes(), getBinaryStream(), or setBinaryStream() methods.
and
LargeObject API.
Also, you can covert your image to a base64 string and then insert it directly using, for instance, PgAdmin:
CREATE TABLE image_table (name varchar(255), DATA bytea);
INSERT INTO image_table
VALUES ('my_image.jpg',
decode('paste your byte array string here', 'base64'));
Full sample code here.

Reading edge list data set in apache giraph?

I'm using SNAP dataset for social network analysis. SNAP uses simple edge list as a data format. How to read SNAP dataset in Apache Giraph?
As per I know SNAP has various data formats depending upon which dataset you are looking at. If the dataset that you are looking at has the format : sourceid destinationid on each line then you might want to use IntNullTextEdgeInputFormat (it's in giraph-core/src/main/java/org/apache/giraph/io/formats ).
Also take a look at various predefined formats available in the same folder. If none of those fit for your dataset format then you can write your own input format class (it will be really simple if you start from the predefined formats and edit it as you need).
use -eif org.apache.giraph.io.formats.IntNullTextEdgeInputFormat
Yes, SNAP uses Simple Edge List format for representing graph databases. You can use this code for converting it to a JSON format which is accepted by Apache Giraph.

Resources