Getting started using Tensorflow JS models and training. Prediction of solar power production - tensorflow.js

I want to ask a probably very noob question here about my first steps with TensorFlow.js. I want to create a model that can predict the daily solar energy production based on the clouds percentage and temperature during the hours of a day.
I have three arrays for this question:
const production = [["0.00","0.00","0.00","0.00","0.00","0.03","0.20","0.42","0.85","1.51","1.58","1.46","1.68","1.68","0.51","0.24","0.14","0.05","0.00","0.00","0.00","0.00","0.00","0.00"],["0.00","0.00","0.00","0.00","0.00","0.01","0.12","0.29","0.81","1.42","1.62","2.09","2.26","1.77","0.44","0.20","0.11","0.04","0.00","0.00","0.00","0.00","0.00","0.00"]]
const clouds = [["0.90","0.90","0.90","0.90","0.80","0.75","0.75","0.75","0.72","0.27","0.34","0.58","0.35","0.20","0.20","0.20","0.20","0.17","0.20","0.20","0.20","0.20","0.20","0.01"],["0.74","0.20","0.22","0.39","0.79","0.75","0.75","0.75","0.50","0.40","0.40","0.40","0.40","0.40","0.26","0.30","0.20","0.20","0.20","0.33","0.77","0.90","0.90","0.90"]]
const temp = [["15.50","15.22","14.65","14.35","13.84","14.46","15.97","17.08","18.30","19.51","20.39","21.00","21.60","21.94","21.94","21.63","20.89","19.32","17.58","16.40","15.63","14.86","14.16","13.64"],["14.98","14.97","14.89","14.51","14.47","15.14","15.83","16.69","17.89","19.10","20.06","20.84","21.46","21.91","21.58","20.99","19.75","18.06","16.75","15.74","15.49","15.26","15.29","15.45"]]
All these arrays are actual measurements for 2 days.
Production is in kWh (for who's interested) ;-)
All these arrays exist of subarrays of 24 children (one for every hour, starting at 01:00). Because the hour of the day is very important for production of course (day or night).
I want to use Tensorflow.js to predict the production for the day when I ask a weather API for the clouds percentage and temperatures for every hour of a day. Throwing the results into my model as [clouds] [temp] and receiving a new array from the model [production]. Each array consists of 24 numbers representing the hours of the day. I think it is sequential, because normally following equation should work: [production] = a + b*[clouds] + c*[temp]
I experimented a lot using the Boston housing example but I don't seem to be able to fit my parameters to this model. If anyone could fit my measurements in the example, that would be really cool.
For the moment I'm working with this class:
import * as tf from '#tensorflow/tfjs';
/**
* Linear model class
*/
export default class LinearModel {
/**
* Train model
*/
async trainModel(temp, clouds, production){
const layers = tf.layers.dense({
units: 24, // Dimensionality of the output space
inputShape: [24], // Only one param
});
const lossAndOptimizer = {
loss: 'meanSquaredError',
optimizer: 'sgd', // Stochastic gradient descent
};
this.linearModel = tf.sequential();
this.linearModel.add(layers); // Add the layer
this.linearModel.compile(lossAndOptimizer);
// Start the model training!
await this.linearModel.fit(
[tf.tensor2d(temp), tf.tensor2d(clouds)],
tf.tensor2d(production),
);
}
}
And the call is like this:
const model = new LinearModel()
await model.trainModel(temp, clouds, production)
As you can probably notice, the code fails for now. And it has something to do with the 2d-3d dimensions of my values. But I don't really find a clue.
The error logs:
ValueError: Error when checking model input: the Array of Tensors that you are passing to your model is not the size the model expected. Expected to see 1 Tensor(s), but instead got the following list of Tensor(s): Tensor
[['13.14', '12.70', '12.09', ..., '16.22', '16.11', '15.96'],
['15.82', '15.84', '15.73', ..., '15.34', '14.31', '13.30'],
['12.83', '12.36', '11.86', ..., '14.29', '13.87', '13.66'],
['13.62', '13.67', '13.68', ..., '14.89', '14.42', '14.05'],
['13.95', '14.00', '14.14', ..., '16.27', '16.24', '16.20'],
['16.23', '16.14', '16.00', ..., '15.39', '15.10', '14.82'],
['14.98', '14.97', '14.89', ..., '15.26', '15.29', '15.45'],
['15.41', '15.35', '15.23', ..., '15.91', '15.65', '15.69'],
['15.50', '15.22', '14.65', ..., '14.86', '14.16', '13.64'],
['13.23', '12.88', '12.55', ..., '16.76', '16.10', '15.50'],
['14.83', '14.22', '13.59', ..., '15.49', '14.84', '14.20'],
['13.70', '13.23', '12.89', ..., '19.00', '19.00', '19.00'],
['19.00', '19.00', '19.00', ..., '16.09', '15.43', '14.77'],
['14.55', '14.09', '19.00', ..., '18.50', '17.10', '16.29'],
['15.78', '15.13', '14.75', ..., '19.00', '17.05', '16.83'],
['16.55', '16.06', '15.70', ..., '19.00', '18.49', '18.50'],
['18.22', '18.27', '18.08', ..., '16.93', '16.27', '16.00'],
['15.98', '16.12', '16.19', ..., '19.00', '19.00', '19.00'],
['19.00', '19.00', '19.00', ..., '19.00', '19.00', '19.00'],
['19.00', '19.00', '19.00', ..., '13.16', '18.64', '17.90']],Tensor
[['0.22', '0.28', '0.17', ..., '0.79', '0.90', '0.80'],
['0.86', '0.87', '0.75', ..., '0.27', '0.20', '0.20'],
['0.20', '0.20', '0.20', ..., '0.20', '0.20', '0.20'],
['0.20', '0.20', '0.20', ..., '0.66', '0.20', '0.13'],
['0.78', '0.29', '0.46', ..., '0.90', '0.82', '0.90'],
['0.90', '0.84', '0.90', ..., '0.25', '0.53', '0.90'],
['0.74', '0.20', '0.22', ..., '0.90', '0.90', '0.90'],
['0.90', '0.90', '0.90', ..., '0.44', '0.68', '0.90'],
['0.90', '0.90', '0.90', ..., '0.20', '0.20', '0.01'],
['0.06', '0.06', '0.06', ..., '0.20', '0.20', '0.20'],
['0.20', '0.20', '0.25', ..., '0.03', '0.02', '0.04'],
['0.10', '0.10', '0.09', ..., '0.20', '0.20', '0.20'],
['0.20', '0.20', '0.20', ..., '0.20', '0.55', '0.59'],
['0.85', '0.69', '0.20', ..., '0.20', '0.20', '0.20'],
['0.20', '0.19', '0.27', ..., '0.20', '0.20', '0.20'],
['0.20', '0.20', '0.20', ..., '0.20', '0.97', '0.96'],
['0.20', '0.20', '0.20', ..., '0.20', '0.20', '0.90'],
['0.90', '0.90', '0.90', ..., '0.20', '0.20', '0.20'],
['0.20', '0.20', '0.20', ..., '0.20', '0.20', '0.20'],
['0.20', '0.20', '0.20', ..., '0.20', '0.54', '0.20']]
at new ValueError (/Users/hacor/regression-test/node_modules/#tensorflow/tfjs-layers/dist/tf-layers.node.js:9827:28)
at standardizeInputData (/Users/hacor/regression-test/node_modules/#tensorflow/tfjs-layers/dist/tf-layers.node.js:18254:19)
at LayersModel.standardizeUserDataXY (/Users/hacor/regression-test/node_modules/#tensorflow/tfjs-layers/dist/tf-layers.node.js:19125:13)
at LayersModel.<anonymous> (/Users/hacor/regression-test/node_modules/#tensorflow/tfjs-layers/dist/tf-layers.node.js:19147:35)
at step (/Users/hacor/regression-test/node_modules/#tensorflow/tfjs-layers/dist/tf-layers.node.js:9745:23)
at Object.next (/Users/hacor/regression-test/node_modules/#tensorflow/tfjs-layers/dist/tf-layers.node.js:9726:53)
at /Users/hacor/regression-test/node_modules/#tensorflow/tfjs-layers/dist/tf-layers.node.js:9719:71
at new Promise (<anonymous>)
at __awaiter (/Users/hacor/regression-test/node_modules/#tensorflow/tfjs-layers/dist/tf-layers.node.js:9715:12)
at LayersModel.standardizeUserData (/Users/hacor/regression-test/node_modules/#tensorflow/tfjs-layers/dist/tf-layers.node.js:19142:16)
Any help would be appreciated!
Best regards,
Hacor

input is always a single tensor and you're sending two - you need to combine temp, clouds into a single tensor
there are number of ways to do that, but most likely you want to have each as a separate dimension, so just stack them:
const inputs = tf.stack([temp, clouds)]);
trainModel(inputs, production)
...
btw, your model has no layers that do anything except a single loss function - so it's not going to do much
take a look at https://github.com/vladmandic/stocks/blob/main/src/model.js to see what im referring to.

Related

Plotting many pie charts using a loop to create a single figure using matplotlib

I'm having trouble converting a script I wrote to create and save 15 pie charts separately which I would like to save as a single figure with 15 subplots instead. I have tried taking fig, ax = plt.subplots(5, 3, figsize=(7, 7)) out of the loop and specifying the number of rows and columns for the plot but I get this error AttributeError: 'numpy.ndarray' object has no attribute 'pie'. This error doesn't occur if I leave that bit of code in the script as is seen below. Any help with tweaking the code below to create a single figure with 15 subplots (one for each site) would be enormously appreciated.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel(path)
df_1 = df.groupby(['Site', 'group'])['Abundance'].sum().reset_index(name='site_count')
site = ['Ireland', 'England', 'France', 'Scotland', 'Italy', 'Spain',
'Croatia', 'Sweden', 'Denmark', 'Germany', 'Belgium', 'Austria', 'Poland', 'Stearman', 'Hungary']
for i in site:
df_1b = df_1.loc[df_1['Site'] == i]
colors = {'Dog': 'orange', 'Cat': 'cyan', 'Pig': 'darkred', 'Horse': 'lightcoral', 'Bird':
'grey', 'Rat': 'lightsteelblue', 'Whale': 'teal', 'Fish': 'plum', 'Shark': 'darkgreen'}
wp = {'linewidth': 1, 'edgecolor': "black"}
fig, ax = plt.subplots(figsize=(7, 7))
texts, autotexts = ax.pie(df_1b['site_count'],
labels=None,
shadow=False,
colors=[colors[key] for key in labels],
startangle=90,
wedgeprops=wp,
textprops=dict(color="black"))
plt.setp(autotexts, size=16)
ax.set_title(site, size=16, weight="bold", y=0)
plt.savefig('%s_group_diversity.png' % i, bbox_inches='tight', pad_inches=0.05, dpi=600)
It's hard to guess how exactly you'd like the plot to look like.
The main changes the code below makes, are:
adding fig, axs = plt.subplots(nrows=5, ncols=3, figsize=(12, 18)). Here axs is a 2d array of subplots. figsize should be large enough to fit the 15 subplots.
df_1b['group'] is used for the labels that decide the color (it's not clear where the labels themselves should be shown, maybe in a common legend)
autopct='%.1f%%' is added to ax.pie(...). This shows the percentages with one decimal.
With autopct, ax.pie(...) now returns 3 lists: texts, autotexts, wedges. The texts are the text objects for the labels (currently empty texts), autotexts are the percentages (that are calculated "automatically"), wedges are the triangular wedges.
ax.set_title now uses the site name, and puts it at a negative y-value (y=0 would overlap with the pie)
plt.tight_layout() at the end tries to optimize the surrounding white space
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
site = ['Ireland', 'England', 'France', 'Scotland', 'Italy', 'Spain',
'Croatia', 'Sweden', 'Denmark', 'Germany', 'Belgium', 'Austria', 'Poland', 'Stearman', 'Hungary']
colors = {'Dog': 'orange', 'Cat': 'cyan', 'Pig': 'darkred', 'Horse': 'lightcoral', 'Bird': 'grey',
'Rat': 'lightsteelblue', 'Whale': 'teal', 'Fish': 'plum', 'Shark': 'darkgreen'}
wedge_properties = {'linewidth': 1, 'edgecolor': "black"}
# create some dummy test data
df = pd.DataFrame({'Site': np.random.choice(site, 1000),
'group': np.random.choice(list(colors.keys()), 1000),
'Abundance': np.random.randint(1, 11, 1000)})
df_1 = df.groupby(['Site', 'group'])['Abundance'].sum().reset_index(name='site_count')
fig, axs = plt.subplots(nrows=5, ncols=3, figsize=(12, 18))
for site_i, ax in zip(site, axs.flat):
df_1b = df_1[df_1['Site'] == site_i]
labels = df_1b['group']
texts, autotexts, wedges = ax.pie(df_1b['site_count'],
labels=None,
shadow=False,
colors=[colors[key] for key in labels],
startangle=90,
wedgeprops=wedge_properties,
textprops=dict(color="black"),
autopct='%.1f%%')
plt.setp(autotexts, size=10)
ax.set_title(site_i, size=16, weight="bold", y=-0.05)
plt.tight_layout()
plt.savefig('group_diversity.png', bbox_inches='tight', pad_inches=0.05, dpi=600)
plt.show()

How to change the order in dimensions xarray.Dataset?

I am creating a xarray dataset as below:
import numpy as np
import xarray as xr
x_example = np.random.rand(1488,)
y_example = np.random.rand(1331,)
time_example = np.random.rand(120,)
rainfall_example = np.random.rand(120, 1331, 1488)
rainfall_dataset = xr.Dataset(
data_vars=dict(
rainfall_depth=(['time', 'y', 'x'], rainfall_example),
),
coords=dict(
time=(['time'], time_example),
x=(['x'], x_example),
y=(['y'], y_example)
)
)
The results are like this
And the dimensions when I run rainfall_example.dims are like this Frozen({'time': 120, 'y': 1331, 'x': 1488}) (this can also be seen in the above results). I know the xarray.Dataset.dims cannot be modified according to here
My question is: How can we change the order of those dimensions into the dimensions like this Frozen({'time': 120, 'x': 1488, 'y': 1331}) without changing anything else (everything will be the same only the order in dimensions is changed)?
You can reorder your coordinates and variables by selecting them both in order using a list:
In [3]: rainfall_dataset[["time", "y", "x", "rainfall_depth"]]
Out[3]:
<xarray.Dataset>
Dimensions: (time: 120, y: 1331, x: 1488)
Coordinates:
* time (time) float64 0.2848 0.7556 0.9501 ... 0.694 0.734 0.198
* y (y) float64 0.1941 0.1132 0.2504 ... 0.1501 0.5085 0.006135
* x (x) float64 0.2776 0.4504 0.1886 ... 0.4071 0.3327 0.5555
Data variables:
rainfall_depth (time, y, x) float64 ...

Importing data from multiple .csv files into single DataFrame

I'm having trouble getting data from several .csv files into a single array. I can get all of the data from the .csv files fine, I just can't get everything into a simple numpy array. The name of each .csv file is important to me so in the end I'd like to have a Pandas DataFrame with the columns labeled by the initial name of the .csv file.
import glob
import numpy as np
import pandas as pd
files = glob.glob("*.csv")
temp_dict = {}
wind_dict = {}
for file in files:
data = pd.read_csv(file)
temp_dict[file[:-4]] = data['HLY-TEMP-NORMAL'].values
wind_dict[file[:-4]] = data['HLY-WIND-AVGSPD'].values
temp = []
wind = []
name = []
for word in temp_dict:
name.append(word)
temp.append(temp_dict[word])
for word in wind_dict:
wind.append(wind_dict[word])
temp = np.array(temp)
wind = np.array(wind)
When I print temp or wind I get something like this:
[array([ 32.1, 31.1, 30.3, ..., 34.9, 33.9, 32.9])
array([ 17.3, 17.2, 17.2, ..., 17.5, 17.5, 17.2])
array([ 41.8, 41.1, 40.6, ..., 44.3, 43.4, 42.6])
...
array([ 32.5, 32.2, 31.9, ..., 34.8, 34.1, 33.7])]
when what I really want is:
[[ 32.1, 31.1, 30.3, ..., 34.9, 33.9, 32.9]
[ 17.3, 17.2, 17.2, ..., 17.5, 17.5, 17.2]
[ 41.8, 41.1, 40.6, ..., 44.3, 43.4, 42.6]
...
[ 32.5, 32.2, 31.9, ..., 34.8, 34.1, 33.7]]
This does not work but is the goal of my code:
df = pd.DataFrame(temp, columns=name)
And when I try to use a DataFrame from Pandas each row is its own array which isn't helpful because it thinks every row has only element in it. I know the problem is with "array(...)" I just don't know how to get rid of it. Thank you in advance for your time and consideration.
I think you can use:
files = glob.glob("*.csv")
#read each file to list of DataFrames
dfs = [pd.read_csv(fp) for fp in files]
#create names for each file
lst4 = [x[:-4] for x in files]
#create one big df with MultiIndex by files names
df = pd.concat(dfs, keys=lst4)
If want separately DataFrames change last row above solution with reshape:
df = pd.concat(dfs, keys=lst4).unstack()
df_temp = df['HLY-TEMP-NORMAL']
df_wind = df['HLY-WIND-AVGSPD']

Appending data in Python

I have a time data with dimensions (95,). I wrote the following code to extract the year, month and day to create an array of dimension (95,3). However, the following code is able to create an array of dimension (285,). How can I create the new time array with dimension (95,3) where the first column represents year, second column - month and the last column the day.
newtime = np.array([])
for i in range(len(time)):
a = seconds_since_jan_1_1993_to_datetime(time[i])
time_year = float(a.strftime("%Y"))
time_mon = float(a.strftime("%m"))
time_day = float(a.strftime("%d"))
newtime = np.append(newtime, np.array([time_year, time_mon, time_day]))
For example, I have an input array with elements array([725696054.99044609, 725696056.99082708, 725696058.99119401, ...])
I want an output of the following form:
Col1 Col2 Col3
2015.0 12.0 31.0
2015.0 12.0 31.0
2015.0 12.0 31.0
Look forward to your suggestions or help.
My suggestion would be working with a dataframe format.
An easy fix to your code would be:
newtime = pd.DataFrame([], columns=['year','month','day'])
for i in range(len(time)):
a = seconds_since_jan_1_1993_to_datetime(time[i])
time_year = float(a.strftime("%Y"))
time_mon = float(a.strftime("%m"))
time_day = float(a.strftime("%d"))
newtime.loc[len(newtime)] = [time_year, time_mon, time_day]
hope that helps!
The dataframe is a good option. However, if you want to keep an array, you can simply use the reshape() function of numpy. Here is an example code :
import numpy as np
newtime = np.array([])
for i in range(12):
# Dummy data generated here, using floats like in the original post
time_year = float(2015.0)
time_mon = float(1.0*i)
time_day = float(31.0)
newtime = np.append(newtime,np.array([time_year, time_mon, time_day]))
newtime = newtime.reshape((-1,3))
Note the argument in the reshape function: (-1,3) will tell numpy to make the second dimension 3, computing automatically the first dimension. Now, if you print newtime, you should see:
[[ 2.01500000e+03 0.00000000e+00 3.10000000e+01]
[ 2.01500000e+03 1.00000000e+00 3.10000000e+01]
[ 2.01500000e+03 2.00000000e+00 3.10000000e+01]
[ 2.01500000e+03 3.00000000e+00 3.10000000e+01]
[ 2.01500000e+03 4.00000000e+00 3.10000000e+01]
[ 2.01500000e+03 5.00000000e+00 3.10000000e+01]
[ 2.01500000e+03 6.00000000e+00 3.10000000e+01]
[ 2.01500000e+03 7.00000000e+00 3.10000000e+01]
[ 2.01500000e+03 8.00000000e+00 3.10000000e+01]
[ 2.01500000e+03 9.00000000e+00 3.10000000e+01]
[ 2.01500000e+03 1.00000000e+01 3.10000000e+01]
[ 2.01500000e+03 1.10000000e+01 3.10000000e+01]]

How to convert string datatypes to float in numpy arrays in Python 3

I have been using Python 2.7 for some time now and have recently switched to Python 3. I have already updated my code on some points, but the problem I currently have deludes me. What I am trying to do is to load a dataset using np.loadtxt. Because this data also contains strings I am importing the full array as a string. I want to do type conversions after to convert some entries to float. This fails miserably and I do not understand why. All I see is that in Python 3 all strings get the prefix 'b' and I have the feeling this has something to do with this, but I cannot find a concise answer. Code and error below.
filename = 'train.csv'
raw_data = open(filename, 'rb')
data = np.loadtxt(raw_data, delimiter=",", dtype = 'str')
dataset = data[1:,1:]
print(dataset)
original_data = dataset
test = float(dataset[0,0])
print(test)
Result
[["b'60'" "b'RL'" "b'65'" ..., "b'WD'" "b'Normal'" "b'208500'"]
["b'20'" "b'RL'" "b'80'" ..., "b'WD'" "b'Normal'" "b'181500'"]
["b'60'" "b'RL'" "b'68'" ..., "b'WD'" "b'Normal'" "b'223500'"]
...,
["b'70'" "b'RL'" "b'66'" ..., "b'WD'" "b'Normal'" "b'266500'"]
["b'20'" "b'RL'" "b'68'" ..., "b'WD'" "b'Normal'" "b'142125'"]
["b'20'" "b'RL'" "b'75'" ..., "b'WD'" "b'Normal'" "b'147500'"]]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-38-c154945cd6f1> in <module>()
5 print(dataset)
6 original_data = dataset
----> 7 test = float(dataset[0,0])
8 print(test)
ValueError: could not convert string to float: "b'60'"
As suggested by dnalow, something goes wrong in the type conversion because I first open the file and then read from it. The solution is to not use open open(filename, 'rb') and np.loadtxt, but to use np.genfromtxt. Code below.
filename = 'train.csv'
data = np.genfromtxt(filename, delimiter=",", dtype = 'str')
dataset = data[1:,1:]
print(dataset)
original_data = dataset
test = float(dataset[0,0])
print(test)
filename = 'train.csv'
data = np.genfromtxt(filename, delimiter=",", dtype = 'str')
dataset = data[1:,1:]
print(dataset)
original_data = dataset
test = float(dataset[0,0])
print(test)
Result
[['60' 'RL' '65' ..., 'WD' 'Normal' '208500']
['20' 'RL' '80' ..., 'WD' 'Normal' '181500']
['60' 'RL' '68' ..., 'WD' 'Normal' '223500']
...,
['70' 'RL' '66' ..., 'WD' 'Normal' '266500']
['20' 'RL' '68' ..., 'WD' 'Normal' '142125']
['20' 'RL' '75' ..., 'WD' 'Normal' '147500']]
60.0

Resources