Weka string attributes for sparse ARFF file - file-format

I am trying to use Weka for text classification.
For this purpose, it makes sense to use the sparse ARFF data file format.
Using Weka 3.7.2, I tried:
Transforming a text directory to an Instances object using
TextDirectoryLoader.
Translating the strings resulting from the former stage to numbers using StringToWordVector.
The first stage worked fine. The second stage caused a problem, described this way in the
Weka's ARFF file format specification:
Warning: There is a known problem saving SparseInstance objects from datasets that have string attributes. In Weka, string and nominal data values are stored as numbers; these numbers act as indexes into an array of possible attribute values (this is very efficient). However, the first string value is assigned index 0: this means that, internally, this value is stored as a 0. When a SparseInstance is written, string instances with internal value 0 are not output, so their string value is lost (and when the arff file is read again, the default value 0 is the index of a different string value, so the attribute value appears to change).
The ARFF format suggests this solution:
To get around this problem, add a dummy string value at index 0 that is never used whenever you declare string attributes that are likely to be used in SparseInstance objects and saved as Sparse ARFF files.
I am trying to do just that - add a dummy string. I have failed to do this manually (by editing the ARFF file). Can anyone who has done this already post an example - either of a program segment doing this, a properly modified ARFF file, or some other way to do this?
Thanks.

Do NOT edit the arff file directly.
I just answered a similar question here:
Weka printing sparse arff file
Use the same code example.

Related

Combine references to create new reference like ${var${randnum}}

I am trying to create a new reference containing another reference as in ${var${randnum}}.
Ultimately, I want to create a variable which refers to a two times two randomized set of variables.
As the above approach did not work, I developed it further with below result.
In the calculate field I write
concat('$','{','trust',${rand_no2},'_' ,${rand_no3_1},'}')
Which should result in
${trust1_1}
and respective combinations.
Without line 11 (name=ref2) the file compiles and I can start it in ODK Collect (v.2.4) on my phone. When I reach line 10 (in ODK Collect), however, I receive the message:
"Error Occured
Dependency cycle in s; recursion limit exceeded!!"
(I included line 11 to show what I want to do in the end.)
I am writing the file in Excel and compile it with ODK xlsform offline. (For testing I transfer it via cable to my phone.)
The xls file for reproduction can be found here:
https://forum.getodk.org/t/concatenate-references-to-create-new-reference-var-randnum/34968
Thank you very much in advance!
You're mixing up some things related to the ${q} syntax, question names and question values.
Note that ODK Collect does not actually understand the ${q} syntax (which is XLSForm-only). It's helpful to look at the actual form format that ODK collect understands which is called XForm, an XML format that XLSForm converts into. However, even if ODK Collect understood the ${q} syntax, your approach still wouldn't work since you're creating a string value for the ref question (using concat). This wouldn't magically be evaluated as a reference / formula. You cannot dynamically create a reference or formula.
At the moment (until ODK supports something like the local-name() function), maybe the best approach is to use position and put the calculated values inside a group. Something like //group/calc[number(${pos})] perhaps. Note that positions are 1-based (so the first item is position 1) and casting the position to a number or integer is required.

Array parameters in FMI 2.0

In FMI 2.0, array parameters are serialized to scalar variables.
Importing tools can display them as arrays, but their size is fixed and their handling is inefficient.
Better array support is currently in development by a working group of the FMI project, but I would like to know about workarounds how to handle array parameters in the meantime.
Ideas are to
hard code them (disadvantage: the are no paramters any more ...)
put them in a CSV file in the resources folder and read them at the start of the simulation (disadvantage: no parameter mask support, complicated)
put them in a string parameter and parse it at simulation start (disadvantage: limited length of strings, complicated)
Are there other ideas / workarounds? Thanks in advance.
Combinations of the ideas outlined in your question are also possible.
Hard code with selector parameter
Here the idea is to hard code several variants of your array and allow the user to select one with a parameter.
I did this in a recent project where a user needed to choose between different spatially resolved initial conditions (e.g. temperature profiles). We used a model to generate more than 100 different sets of spatially resolved initial conditions (each representing a different "history" of the modeled object), hard coded them as FORTRAN arrays (the inner core of the FMU was in FORTRAN), and used a single integer parameter to select which profile he wanted to use.
It worked very well and the user has no way of breaking it.
Shorten the array and interpolate
If the data in your array is smooth, you might be able to dramatically reduce the number of values you actually need to pass to your simulation - which would make serialization into scalar parameters less painful.
Within the FMU, interpolate to get the resolution you need.
String parameter to select csv file
You can use a string parameter to provide the path to a user-provided csv-file. I would not recommend this, because the user will most likely break it.

Modelica Flexible Array Size - Error: Failed to expand the variable

I still have problems when using arrays with undifined size in Modelica.
It would be really helpful to understand what is the underlying problem.
I read a lot of stuff (known size at compile time, functions, constructor and destructor, records and so on) but I'm still not sure what is the proper way to use flexible arrays in Modelica.
Can somebody give a good explanation, maybe using the following example?
Given data:
x y
1 7
2 1
3 1
4 7
Modelica model which works fine:
model Test_FixedMatrix
parameter Real A[4,2] = [1,7;2,1;3,1;4,7];
parameter Integer nA = size(A,1);
parameter Integer iA[:] = Modelica.Math.BooleanVectors.index( {A[i,2] == 1 for i in 1:nA});
parameter Real Aslice[:,:] = A[iA,:];
end Test_FixedMatrix;
The array iA[:] gives the indices of all rows having an y = 1 and Aslice[:,:] includes all this rows in one matrix.
Now instead of using a given matrix, I want to read the data from a file. Using the Modelica library ExternData (https://github.com/tbeu/ExternData) it is possible to read the data from an xlsx-file:
model Test_MatrixFromFile
ExternData.XLSXFile xlsxfile(fileName="data/ExampleData.xlsx");
parameter Real B[:,:] = xlsxfile.getRealArray2D("A2","Tabelle1",4,2);
parameter Integer nB = size(B,1);
parameter Integer iB[:] = Modelica.Math.BooleanVectors.index( {B[i,2] == 1 for i in 1:nB});
parameter Real Bslice[:,:] = B[iB,:];
end Test_MatrixFromFile;
The xlsx-file looks like this:
It's exatly the same code, but this time I get the error message:
(If I delete the two lines with iB and Bsliece then it works, so the problem is not the xlsxfile.getRealArray2D function)
Application example and further questions:
It would be really nice to be able to just read a data file into Modelica and then prepare the data directly within the Modelica code without using any other software. Of course I can prepare all the data using e.g. a Python script and then use TimeTable components... but sometimes this is not really comfortable.
Is there another way to read in data (txt or csv) without specifying the size of the table in it?
How would it look like, if I want to read-in data from different files e.g. every 10 seconds?
I also found the Modelica functions readFile and countLines in Modelica.Utilities.Streams. Is it maybe only possible to do this using external functions?
I assume the fundamental issue is, that Modelica.Math.BooleanVectors.index returns a vector of variable size. The size depends on the input of the function which in the given case is the xlsx file. Therefore the size of iB depends on the content of the xlsx file which can change independently of the model itself. Therefore the problem you originally stated "known size at compile time" is problematic, as the size of iB is dependent on a non-parameter.
Another thing that I learned from my experience with Dymola is that it behaves differently if functions like ExternData.XLSXFile are translated before usage. This means that you need to open the function in Dymola (double-clicking it in the package browser) and press the translate button (or F9). This will generate the respective .exe file in your working directory and make Dymola more "confident" in outputs of the function being parameters rather than variables. I don't know the details of the implementations and therefore can't give exact details, but it is worth a try.

Adding numbers from 3 different CSV columns

I am newer to programming and I am trying to figure out how I can add all of the numbers from the B column together and all of the C column together and the D column together with those variables named x y and z respectively. I have been searching everywhere for an answer. All I have gotten is to read the first line of the csv file.
This is a screenshot of part of the CSV file:
Once you've read it in, you need to somehow split it up into actual columns. Check this function, that should help.
strtok()
Then you need to store it in some kind of array. Then index which elements in the array you want, and process them by using a cumulative sum. This might require a while loop until you've reached the last line of the file.
I will leave the actual code to you, but this should get you going.

Extract values from Array{Any} in Julia

I have a text file containing special characters, text and some numbers and I need to extract from it some values which appear every n-th row. Since the file has around 20k rows, I want the algorithm to find first and next rows. I've read text file to matrix with readdlm(), but the type of array is ANY, and findfirst() gives and error "access to undefined reference". Could you give me some guidance, please?
Regards
Mike
#Jubobs Here is results file :
https://copy.com/i9GeXhK0qHdfwpkT
I need to extract values for node 79 , which starts at row 10
So i want the algorithm to find the row file[10,1] and get the value file[10,2], and then next and so on.
file=readdlm("results.txt")
findfirst(file,"79")
access to undefined reference
while loading In[13], in expression starting on line 1
in findnext at array.jl:1034 (repeats 2 times)
Best bet would be to read the file as a string and search through it using a regex (or manually if you prefer). It doesn't make much sense to open your txt file as a delimited one (probably that's where your error stems from). Instead, do the following:
f = open("results.txt") # f <: some kind of IO object
data = readall(f) # data <: AbstractString
close(f)
I hope this helps.

Resources