Inserting a Matlab Float Array into postgresql float[] column - database

I am using JDBC to access a postgresql database through Matlab, and have gotten hung up when trying to insert an array of values that I would rather store as an array instead of individual values. The Matlab code that I'm using is as follows:
insertCommand = 'INSERT INTO neuron (classifier_id, threshold, weights, neuron_num) VALUES (?,?,?,?)';
statementObject = dbhandle.prepareStatement(insertCommand);
statementObject.setObject(1,1);
statementObject.setObject(2,output_thresholds(1));
statementObject.setArray(3,dbHandle.createArrayOf('"float8"',outputnodes(1,:)));
statementObject.setObject(4,1);
statementObject.execute;
close(statementObject);
Everything functions properly except for the line dealing with Arrays. The object outputnodes is a <5x23> double matrix, so I'm attempting to put the first <1x23> into my table.
I've tried several different combinations of names and quotes for the '"float8"' part of the createArrayof call, but I always get this error:
??? Java exception occurred:
org.postgresql.util.PSQLException: Unable to find server array type for provided name "float8".
at org.postgresql.jdbc4.AbstractJdbc4Connection.createArrayOf(AbstractJdbc4Connection.java:82)
at org.postgresql.jdbc4.Jdbc4Connection.createArrayOf(Jdbc4Connection.java:19)
Error in ==> Databasetest at 22
statementObject.setArray(3,dbHandle.createArrayOf('"float8"',outputnodes(1,:)));

Performance of JDBC connector for arrays
I'd like to note that in the case you have to export rather big volumes of data containing arrays JDBC may not be the best choice. Firstly, its performance degrades due to the overhead caused by a conversion of native Matlab arrays into org.postgresql.jdbc.PgArray objects. Secondly, this may lead to a shortage of Java heap memory (and simply increasing Java heap memory size may not be a panacea). Both these points can be seen on the following picture illustrating the performance of datainsert method from Matlab Database Toolbox (it works with PostgreSQL exactly through a direct JDBC connection):
The blue graph displays the performance of batchParamExec command from PgMex library (see https://pgmex.alliedtesting.com/#batchparamexec for details). The endpoint of the red graph corresponds
to a certain maximum data volume passed into the database by datainsert without any error.
A data volume greater than that maximum causes “out of Java heap memory” problem
(Java heap size is specified at the top of the figure).
For further details of experiments please see the following
paper with full benchmarking results for data insertion.
Example reworked
As can be seen PgMex based on libpq (the official C application programmer's interface to PostgreSQL) has greater performance and able to process volumes at least up to more than 2Gb.
Using this library your code can be rewritten as follows (we assume below that all the parameters marked by <> signs are properly filled, that the table neuron already exists in the database and have fields classifier_id of int4, threshold of float8, weights of float8[] and neuron_num of int4 and, at last, that the variables classfierIdVec, output_thresholds, outputnodes and neuronNumVec are already defined and are numerical arrays of sizes shown in the comments in the code below; in the case the types of table fields are different you need to appropriately fix the last command of the code):
% Create the database connection
dbConn = com.allied.pgmex.pgmexec('connect',[...
'host=<yourhost> dbname=<yourdb> port=<yourport> '...
'user=<your_postgres_username> password=<your_postgres_password>']);
insertCommand = ['INSERT INTO neuron '...
'(classifier_id, threshold, weights, neuron_num) VALUES ($1,$2,$3,$4)'];
SData = struct();
SData.classifier_id = classifierIdVec(:); % [nTuples x 1]
SData.threshold = output_thresholds(:); % [nTuples x 1]
SData.weights = outputnodes; % [nTuples x nWeights]
SData.neuron_num = neuronNumVec; % [nTuples x 1]
com.allied.pgmex.pgmexec('batchParamExec',dbConn,insertCommand,...
'%int4 %float8 %float8[] %int4',SData);
It should be noted that outputnodes needs not to be cut along rows on separate arrays because the latter ones
are of the same length. In the case of arrays for different tuples having different sizes it is necessary to pass them
as a column cell array with each cell containing its own array for each tuple.
EDIT: Currently PgMex has free academic licensing.

I was getting confused with the documentation which all used double quotes, which Matlab doesn't allow, using only single quotes actually resolved this. The correct line was:
statementObject.setArray(3,dbHandle.createArrayOf('float8',outputnodes(1,:)));
instead of
statementObject.setArray(3,dbHandle.createArrayOf('"float8"',outputnodes(1,:)));
I originally thought that the problem was with the alias that I was using for double precision was incorrect, but as Craig pointed out in the comment above this isn't the case.

Related

Xarray: is there a neat way to reduce entire dataset dimension size x and y without changing datatype for variables?

I'm trying to reduce my xarray dataset dimension (x and y) by one, e.g. 257x257 to 256x256.
This code is what I have tried:
if cube.dims['x'] > patch_size:
cube=cube.where((cube.y<cube.y.data.max()) & (cube.x<cube.x.max()),drop=True)
if cube.dims['x'] > patch_size:
cube=cube.where((cube.y<cube.y.data.max()) & (cube.x<cube.x.max()),drop=True)
When I run this code, all variables change the data type to float64 (probably because where() statement converts to "nan" while selecting data).
Is there a better way of doing this without changing the data type to float64?
Rather than masking and dropping the data with where, simply select the data you want along the x and y dimensions. This will be faster than masking with where, and will get around the int->float type promotion problem you're seeing.
You could do this a number of ways, e.g. with .sel for label-based indexing or .isel for positional indexing. Since you're trying to extract all but the last indices, I'll use isel with slice:
cube = cube.isel(y=slice(None, -1), x=slice(None, -1))
This could also be done with sel, using filtered coordinates as you have in your question:
ds.sel(x=(ds.x < ds.x.max()), y=(ds.y < ds.y.max()))
Both of these methods will work with both DataArray and Datasets.
If you're working with a DataArray, you could also do this by referring to the dimensions positionally with .loc - just be careful when doing this because xarray doesn't always preserve dimension ordering:
cube = cube.loc[:-1, :-1]
Referring to dimensions by order like this is not allowed for datasets.
See the xarray docs on Indexing and Selecting Data for more info on this topic.
Note on type promotion with .where
While the above is going to perform better for your specific case, you've raised an interesting issue about what I think is an unnecessary limitation of DataArray.where. I think there's no reason why you couldn't set a custom other fill value when using .where with drop=True. Currently, this is prohibited by an assertion in xarray - I've raised an issue here to see if we can fix this: gh#6466.

How to create an Array (IDataHolder) in Thinkscript? (Thinkorswim)

I am trying to create an irregular volume scanner on Thinkorswim using Thinkscript. I want to create an array of volume's in past periods so that I can compare them to the current period's volume (using fold or recursion). However, while the Thinkorswim documentation details what is called an IDataHolder datatype, which is an array of data, I cannot find how one can actually create one, as opposed to just referencing the historical data held by Thinkorswim. Here is the documentation: https://tlc.thinkorswim.com/center/reference/thinkScript/Data-Types/IDataHolder
I have tried coding something as simple as this to initialize an array:
def array = [];
This throws an error. I have tried different types of brackets, changing any possible syntax issues, etc.
Is this possible in the Thinkscript language? If not, are there any workarounds? If not even that, is there a third party programming interface that I could use to pull data from Thinkorswim and get a scanner that way? Thanks for any help.
IDataHolder represents data such as close, open, volume, etc, that is held across multiple bars or ticks. You can reference one of these pre-defined data series, or you can create your own using variables: def openPlus5cents = open + 0.05, say, would be an IDataHolder type value.
There's no way to create an array in the usual programming sense, as you've found, so you'll have to be a little creative. Perhaps, say, you could do the comparison within the fold? volume[1] > volume, or the like? Maybe post another question with an example of the comparison you're trying to do?

How can I read a parameter array from excel so that its values are available when stocks are initialized?

The problem:
I need to read a row of values from an excel spreadsheet into a parameter array and use these values to set the initial values of stocks.
The specifics:
a. I can successfully set the scalar parameters default values from excel using ExcelControlName.getCellNumericValue("ExcelSheetName", RowNumber, ColumnNumber).
b. Trying to set default values for array parameters with ExcelControlName.readHyperArray(DestinationArrayName,"ExcelSheetName",RowNumber, ColumnNumber, false) returns a "Cannot return a void result" error on build.
c. I can read the parameter arrays from a function called from the Agent actions "On startup:" section using ExcelControlName.readHyperArray(DestinationArrayName,"ExcelSheetName",RowNumber, ColumnNumber, false).
d. Stocks with their initial values set to the parameter array that was successfully loaded by the function are all zero even though the parameter array shows values (when run). Initial value: ParameterArrayName.
e. When I set the parameter array values through the value editor the stocks initialize correctly.
My suspicion:
I'm thinking that the issue has something to do with the internal timing within the entrails of the model generated by Anylogic such that the function to load the parameters is executed after the stocks get initial values - but that could just be the delirium caused by repeatedly smashing my forehead into the wall. But, if this is indeed the case, how can I sneak the function in earlier or, better yet, how would someone who actually knows what they're doing accomplish this?
What I'm trying to accomplish:
As a septuagenarian with lots of time on my hands and a vague recollection of dynamic modeling using Dynamo from a Systems Science program in the early seventies (and not used since), I thought I'd take a whack at age-based modeling of the COVID-19 pandemic. I wanted to see, among other things, whether establishing elder-prisons (in now vacant Club-Meds, Sandals... I'm sure) would be an economically advantageous strategy. Getting there requires dis-aggregating classic SIR approaches into age-specific chains of causality. So far, I'm at 27 age-specific parameters for eight age-groups and 24 scalar parameters. As much as I'd like to type and retype... all this data I'm really hoping there is a better way.
I must say that I am amazed at how far modeling has come in only 50 or so years and I am enthralled with Anylogic's application - even though it's a bit java-ish for me.
Thank you ever so much,
Carl Cottrell
Equivalent parameter arrays - one with values entered in edit the other read from excel file through a function
Not sure if I understand but here it goes:
I understand that you have the following structure:
and you want to initialize the stock as follows.
To do that, on the initial value of your parameter use a function that returns an HyperParameter:
the function getInitialValue should return an HyperParameter and I think this code should work for you (you have to choose RowNumber according to whatever is on your excelfile, MyDimension is the name of your dimension, and ExcelControlName is the excel in which you hold the values of the initial values of the stock)
HyperArray x=new HyperArray(MyDimension);
for(int i=0;i<=numColumns;i++){
x.set(ExcelControlName.getCellNumericValue("ExcelSheetName", RowNumber, i), i);
}
return x;

Are there Erlang arrays "with a defined representation"?

Context:
Erlang programs running on heterogeneous nodes, retrieving and storing data
from Mnesia databases. These database entries are meant to be used for a long
time (e.g. across multiple Erlang version releases) remains in the form of
Erlang objects (i.e. no serialization). Among the information stored, there are
currently two uses for arrays:
Large (up to 16384 elements) arrays. Fast access to an element
using its index was the basis for choosing this type of collection.
Once the array has been created, the elements are never modified.
Small (up to 64 elements) arrays. Accesses are mostly done using indices, but there are also some iterations (foldl/foldr). Both reading and replacement of the elements is done frequently. The size of the collection remains constant.
Problem:
Erlang's documentation on arrays states that "The representation is not
documented and is subject to change without notice." Clearly, arrays should not be used in my context: database entries containing arrays may be
interpreted differently depending on the node executing the program and
unannounced changes to how arrays are implemented would make them unusable.
I have noticed that Erlang features "ordsets"/"orddict" to address a similar
issue with "sets"/"dict", and am thus looking for the "array" equivalent. Do you know of any? If none exists, my strategy is likely going to be using lists of lists to replace my large arrays, and orddict (with the index as key) to replace the smaller ones. Is there a better solution?
An array is a tuple of nested tuples and integers, with each tuple being a fixed size of 10 and representing a segment of cells. Where a segment is not currently used an integer (10) acts as a place holder. This without the abstraction is I suppose the closet equivalent.You could indeed copy the array module from otp and add to your own app and thus it would be a stable representation.
As to what you should use devoid of array depends on the data and what you will do with it. If data that would be in your array is fixed, then a tuple makes since, it has constant access time for reads/lookups. Otherwise a list sounds like a winner, be it a list of lists, list of tuples, etc. However, once again, that's a shot in the dark, because I don't know your data or how you use it.
See the implementation here: https://github.com/erlang/otp/blob/master/lib/stdlib/src/array.erl
Also see Robert Virding's answer on the implementation of array here: Arrays implementation in erlang
And what Fred Hebert says about the array in A Short Visit to Common Data Structures
An example showing the structure of an array:
1> A1 = array:new(30).
{array,30,0,undefined,100}
2> A2 = array:set(0, true, A1).
{array,30,0,undefined,
{{true,undefined,undefined,undefined,undefined,undefined,
undefined,undefined,undefined,undefined},
10,10,10,10,10,10,10,10,10,10}}
3> A3 = array:set(19, true, A2).
{array,30,0,undefined,
{{true,undefined,undefined,undefined,undefined,undefined,
undefined,undefined,undefined,undefined},
{undefined,undefined,undefined,undefined,undefined,
undefined,undefined,undefined,undefined,true},
10,10,10,10,10,10,10,10,10}}
4>

Modelica Flexible Array Size - Error: Failed to expand the variable

I still have problems when using arrays with undifined size in Modelica.
It would be really helpful to understand what is the underlying problem.
I read a lot of stuff (known size at compile time, functions, constructor and destructor, records and so on) but I'm still not sure what is the proper way to use flexible arrays in Modelica.
Can somebody give a good explanation, maybe using the following example?
Given data:
x y
1 7
2 1
3 1
4 7
Modelica model which works fine:
model Test_FixedMatrix
parameter Real A[4,2] = [1,7;2,1;3,1;4,7];
parameter Integer nA = size(A,1);
parameter Integer iA[:] = Modelica.Math.BooleanVectors.index( {A[i,2] == 1 for i in 1:nA});
parameter Real Aslice[:,:] = A[iA,:];
end Test_FixedMatrix;
The array iA[:] gives the indices of all rows having an y = 1 and Aslice[:,:] includes all this rows in one matrix.
Now instead of using a given matrix, I want to read the data from a file. Using the Modelica library ExternData (https://github.com/tbeu/ExternData) it is possible to read the data from an xlsx-file:
model Test_MatrixFromFile
ExternData.XLSXFile xlsxfile(fileName="data/ExampleData.xlsx");
parameter Real B[:,:] = xlsxfile.getRealArray2D("A2","Tabelle1",4,2);
parameter Integer nB = size(B,1);
parameter Integer iB[:] = Modelica.Math.BooleanVectors.index( {B[i,2] == 1 for i in 1:nB});
parameter Real Bslice[:,:] = B[iB,:];
end Test_MatrixFromFile;
The xlsx-file looks like this:
It's exatly the same code, but this time I get the error message:
(If I delete the two lines with iB and Bsliece then it works, so the problem is not the xlsxfile.getRealArray2D function)
Application example and further questions:
It would be really nice to be able to just read a data file into Modelica and then prepare the data directly within the Modelica code without using any other software. Of course I can prepare all the data using e.g. a Python script and then use TimeTable components... but sometimes this is not really comfortable.
Is there another way to read in data (txt or csv) without specifying the size of the table in it?
How would it look like, if I want to read-in data from different files e.g. every 10 seconds?
I also found the Modelica functions readFile and countLines in Modelica.Utilities.Streams. Is it maybe only possible to do this using external functions?
I assume the fundamental issue is, that Modelica.Math.BooleanVectors.index returns a vector of variable size. The size depends on the input of the function which in the given case is the xlsx file. Therefore the size of iB depends on the content of the xlsx file which can change independently of the model itself. Therefore the problem you originally stated "known size at compile time" is problematic, as the size of iB is dependent on a non-parameter.
Another thing that I learned from my experience with Dymola is that it behaves differently if functions like ExternData.XLSXFile are translated before usage. This means that you need to open the function in Dymola (double-clicking it in the package browser) and press the translate button (or F9). This will generate the respective .exe file in your working directory and make Dymola more "confident" in outputs of the function being parameters rather than variables. I don't know the details of the implementations and therefore can't give exact details, but it is worth a try.

Resources