Multi-dimensional data structure management in R - arrays

I have a concern about data organisation and the best approach to simplify some multi-layered data. Simply, I have a 10 replicates of small wood beams (BeamID, ~10) subjected to a 10 different treatment (TreatID, ~10), and each beam is load tested which produces a series data of a Load with consequent Displacement (ranging from 10 to 50 rows per test; I have code that corrects for disparities in row length). Each wood beam is tested multiple times (Rep, ~10).
My plan was to lump all this data into a 5-D array:
Array[Load, Deflection, BeamID, TreatID, Rep]
This way, I should be able to plot the load~deflection curves for a given BeamID, TreatID, for all Reps by using Array[ , ,1,1, ], right? So the hypothetical output for Array[ , ,1,1,1], would be:
+------------+--------+-----+
| Deflection | Load | Rep |
+------------+--------+-----+
| 0 | 0 | 1 |
| 6.35 | 10.5 | 1 |
| 12.7 | 20.8 | 1 |
| 19.05 | 45.3 | 1 |
| 25.4 | 75.2 | 1 |
+------------+--------+-----+
And Array[ , ,1,1,2] would be:
+------------+--------+-----+
| Deflection | Load | Rep |
+------------+--------+-----+
| 0 | 0 | 2 |
| 7.3025 | 12.075 | 2 |
| 14.605 | 23.92 | 2 |
| 21.9075 | 52.095 | 2 |
| 29.21 | 86.48 | 2 |
+------------+--------+-----+
Or I think I could keep it as a simpler, 'melted' dataframe, which would have columns for Load and Deflection, and BeamID, TreatID, and Rep would be repeated for each row of the test output.
+------------+--------+-----+--------+---------+
| Deflection | Load | Rep | BeamID | TreatID |
+------------+--------+-----+--------+---------+
| 0 | 0 | 1 | 1 | 1 |
| 6.35 | 10.5 | 1 | 1 | 1 |
| 12.7 | 20.8 | 1 | 1 | 1 |
| 19.05 | 45.3 | 1 | 1 | 1 |
| 25.4 | 75.2 | 1 | 1 | 1 |
| 0 | 0 | 2 | 1 | 1 |
| 7.3025 | 12.075 | 2 | 1 | 1 |
| 14.605 | 23.92 | 2 | 1 | 1 |
| 21.9075 | 52.095 | 2 | 1 | 1 |
| 29.21 | 86.48 | 2 | 1 | 1 |
+------------+--------+-----+--------+---------+
However, with the latter, I'm not sure how I could easily and discretely pull out all the Rep test values for a specific BeamID and TreatID, especially since I use a linear model to fit a 3rd order polynomial for an specific test to extract the slope of the curves. Having it as a continuous dataframe means I'd have to specify starting and stopping points to start the linear model, correct?
Thoughts, suggestions? Am I headed in the right direction in using a 5-D array? R is a new programming language for me, so please pardon my misunderstandings.

Related

Best way of storing enumerated fields with ability to change order Postgres

What is the best way for storing enumerated fields with ability to change its order?
Lets say my database looks like this:
| Table |
|---------------------|
| id | name | order|
| 1 | 1st | 1 |
| 2 | 2nd | 2 |
| 3 | 3rd | 3 |
| 4 | 4th | 4 |
Now, when user change order in such a away
| Table |
|---------------------|
| id | name | order|
| 1 | 1st | 1 |
| 4 | 4nd | 2 |
| 2 | 2nd | 3 |
| 3 | 3rd | 4 |
Here I would have to update all rows in this table.
I consider 2 solutions
Solution 1)
When inserting row X between for example order 2 and order 3, I would change row's X order field to 3.5, So I would choose number in the middle between adjacent orders.
Above table would look like this
| Table |
|---------------------|
| id | name | order|
| 1 | 1st | 1 |
| 4 | 4nd | 2.5 |
| 2 | 2nd | 2 |
| 3 | 3rd | 3 |
Then, after for example 16 changes I would update table and normalize all order fields, so table after normalization would be like this:
| Table |
|---------------------|
| id | name | order|
| 1 | 1st | 1 |
| 4 | 4nd | 2 |
| 2 | 2nd | 3 |
| 3 | 3rd | 4 |
Solution 2)
I also consider adding fields "next" (or "next" and "prev") to each row, but it looks for me like waste of memory.
I really dont want to update whole table every time somebody change order. What is the best way of solving this problem?

Generate variables that move information between rows in hierarchical data with spss syntax

I was wondering if you can help me with the following problem in spss syntax.
My dataset has nested structure.
Data are nested in companies, then each company has 1 or 2 bosses, but in this case I care only about boss 1. At a previous stage in time the boss graded the workers (not all of them). Now, the ID and the grade of the workers is on the row each worker.
I would like to move the information that was obtained during worker's assessment and create new sets of variables for each (worker ID and grade) on the line/row of the boss.
+---------+------+--------+--------------+---------+---------+--------+---------+
| company | boss |workerID|worker's grade|N:workID1|N:grade1 |N:work2 |N:grade2 |
+---------+------+--------+--------------+---------+---------+--------+---------+
| A | 1 | 1 | | 3 | A | 4 | A |
| A | 2 | 2 | | | |
| A | 0 | 3 | A | | |
| A | 0 | 4 | A | | |
| A | 0 | 5 | | | |
| B | 1 | 1 | | 3 | B | 4 | A |
| B | 0 | 2 | | | |
| B | 0 | 3 | B | | |
| B | 0 | 4 | A | | |
| C | 1 | 1 | | 2 | D | -1 | -1 |
| C | 0 | 2 | D | | |
I would like to move the worker's id and the grade that to the row of the boss in the NEW variables, without loosing the existing variables on workerID and worker's grade.
Basically, I will need to feed forward the information into the new variables and to the row of boss EQ 1 separately for each company.
I have no idea how to proceed with this. I assume that I need a loop that creates new variable for each worker ID that has a valid grade and then feeds forward the information from the worker's row to the boss' newly generated variables.
Any suggestions are very wellcome :-)
Take a look at VARSTOCASES (Data > Restructure)

How to make a SQL "IF-THEN-ELSE" statement

I've seen other questions about SQL If-then-else stuff, but I'm not seeing how to relate it to what I'm trying to do. I've been using SQL for about a year now but only basic stuff and never this.
If I have a SQL table that looks like this
| Name | Version | Category | Value | Number |
|:-----:|:-------:|:--------:|:-----:|:------:|
| File1 | 1.0 | Time | 123 | 1 |
| File1 | 1.0 | Size | 456 | 1 |
| File1 | 1.0 | Final | 789 | 1 |
| File2 | 1.0 | Time | 312 | 1 |
| File2 | 1.0 | Size | 645 | 1 |
| File2 | 1.0 | Final | 978 | 1 |
| File3 | 1.0 | Time | 741 | 1 |
| File3 | 1.0 | Size | 852 | 1 |
| File3 | 1.0 | Final | 963 | 1 |
| File1 | 1.1 | Time | 369 | 2 |
| File1 | 1.1 | Size | 258 | 2 |
| File1 | 1.1 | Final | 147 | 2 |
| File2 | 1.1 | Time | 741 | 2 |
| File2 | 1.1 | Size | 734 | 2 |
| File2 | 1.1 | Final | 942 | 2 |
| File3 | 1.1 | Time | 997 | 2 |
| File3 | 1.1 | Size | 997 | 2 |
| File3 | 1.1 | Final | 985 | 2 |
How can I write a SQL IF, ELSE statement that creates a new column called "Replication" that follows this rule:
A = B + 1 when x = 1
else
A = B
where A = the number we will use for the next Number
B = Max(Number)
x = Replication count (this is the number of times that a loop is executed. x=i)
The results table will look like this:
| Name | Version | Category | Value | Number | Replication |
|:-----:|:-------:|:--------:|:-----:|:------:|:-----------:|
| File1 | 1.0 | Time | 123 | 1 | 1 |
| File1 | 1.0 | Size | 456 | 1 | 1 |
| File1 | 1.0 | Final | 789 | 1 | 1 |
| File2 | 1.0 | Time | 312 | 1 | 1 |
| File2 | 1.0 | Size | 645 | 1 | 1 |
| File2 | 1.0 | Final | 978 | 1 | 1 |
| File1 | 1.0 | Time | 369 | 1 | 2 |
| File1 | 1.0 | Size | 258 | 1 | 2 |
| File1 | 1.0 | Final | 147 | 1 | 2 |
| File2 | 1.0 | Time | 741 | 1 | 2 |
| File2 | 1.0 | Size | 734 | 1 | 2 |
| File2 | 1.0 | Final | 942 | 1 | 2 |
| File1 | 1.1 | Time | 997 | 2 | 1 |
| File1 | 1.1 | Size | 997 | 2 | 1 |
| File1 | 1.1 | Final | 985 | 2 | 1 |
| File2 | 1.1 | Time | 438 | 2 | 1 |
| File2 | 1.1 | Size | 735 | 2 | 1 |
| File2 | 1.1 | Final | 768 | 2 | 1 |
| File1 | 1.1 | Time | 786 | 2 | 2 |
| File1 | 1.1 | Size | 486 | 2 | 2 |
| File1 | 1.1 | Final | 135 | 2 | 2 |
| File2 | 1.1 | Time | 379 | 2 | 2 |
| File2 | 1.1 | Size | 943 | 2 | 2 |
| File2 | 1.1 | Final | 735 | 2 | 2 |
EDIT: Based on the answer by Sean Lange, this is my 2nd attempt at a solution:
SELECT COALESCE(MAX)(Number) + CASE WHEN Replication = 1 then 1 else 0, 1) FROM Table
The COALESCE is in there for when there is no value yet in the Number column.
The IF/Else construct is used to control flow of statements in t-sql. You want a case expression, which is used to conditionally return values in a column.
https://msdn.microsoft.com/en-us/library/ms181765.aspx
Yours would be something like:
case when x = 1 then A else B end as A
As SeanLange pointed out in this case it would be better to use an CASE/WHEN but to illustrate how to use If\ELSE the way to do it in sql is like this:
if x = 1
BEGIN
---Do something
END
ELSE
BEGIN
--Do something else
END
I would say the best way to know the difference and when to use which is if you are writing a query and want a different field to appear based on a certain condition, use case/when. If a certain condition will cause a series of steps to happen then use if/else

What's the idiomatic way to split a Smalltalk array at the spot where a series of values changes?

Given an array of domain objects (with the properties subject, trial and run) like this:
+---------+-------+-----+
| Subject | Trial | Run |
+---------+-------+-----+
| 1 | 1 | 1 |
| 1 | 2 | 1 |
| 1 | 3 | 2 |
| 1 | 4 | 2 |
| 2 | 1 | 1 |
| 2 | 2 | 1 |
| 1 | 1 | 1 |
| 1 | 2 | 1 |
+---------+-------+-----+
i want to split it into multiple arrays at every point where the value for subject changes.
The above example should result in three arrays:
+---------+-------+-----+
| Subject | Trial | Run |
+---------+-------+-----+
| 1 | 1 | 1 |
| 1 | 2 | 1 |
| 1 | 3 | 2 |
| 1 | 4 | 2 |
+---------+-------+-----+
+---------+-------+-----+
| 2 | 1 | 1 |
| 2 | 2 | 1 |
+---------+-------+-----+
+---------+-------+-----+
| 1 | 1 | 1 |
| 1 | 2 | 1 |
+---------+-------+-----+
What would be the idiomatic Smalltalk (Pharo) way to split the array like this?
SequenceableCollection >> piecesCutWhere: which takes a binary block is your friend:
{ 1. 1. 2. 2. 2. 3. 1. 2. } piecesCutWhere: [:left :right | left ~= right]
=> an OrderedCollection #(1 1) #(2 2 2) #(3) #(1) #(2)

Understanding to convert a multi-dimensional array to a one-dimensional array

There is a really good explanation of multi-dimensional array here on stackoverflow which I have studied and researched but i have few follow up questions for anyone who wants to help out. This is not a HW question, it is out of my text book which I am trying to understand more so please confirm if I am looking at the below example correctly. Thank you in advance.
So if i had a 3 dimensional array such as this:
{{{'1','2'},{'3','4'}},
{{'5','6'},{'7','8'}},
{{'9','10'},{'11','12'}}};
Would the one dimensional outcome (using c compiler) simply be?:
+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
| | | | | | | | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
And the corresponding position as?
+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| | | | | | | | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
Again I am using this link as my source.
The only thing I am looking for as a form of answer is, am I looking/doing this correctly? If not, I would appreciate it if you can tell me where I have made any mistakes. Thank you again.
1.
char [3][2][2] :
+-----+-----+ +-----+-----+
|+-----+-----+ |+-----+-----+
|| 1 | 3 | || 4 | 5 |
||1,0+-----+-----+ || +-----+-----+
|+---| a | b | |+---| 0 | 1 |
|| 2|0,0,0|0,0,1| || 6| | |
+|1,1+-----+-----+ => +| +-----+-----+
+---| x | y | +---| 2 | 3 |
|0,1,0|0,1,1| | | |
+-----+-----+ +-----+-----+
so your outcome seems ok, and thus (2.) t3[0] should be a.
2.
if t2 looks like this, t2[0][1] is b:
+-----+-----+-----+-----+ +-----+-----+-----+-----+
| a | b | x | y | | | | | |
|0,0,0|0,0,1|0,1,0|0,1,1| | 0,0 | 0,1 | 0,2 | 0,3 |
+-----+-----+-----+-----+ +-----+-----+-----+-----+
| 1 | 3 | 2 | 7 | => | | | | |
|1,0,0|1,0,1|1,1,0|1,1,1| | 1,0 | 1,1 | 1,2 | 1,3 |
+-----+-----+-----+-----+ +-----+-----+-----+-----+
| q | g | r | 4 | | | | | |
|2,0,0|2,0,1|2,1,0|2,1,1| | 2,0 | 2,1 | 2,2 | 2,3 |
+-----+-----+-----+-----+ +-----+-----+-----+-----+
As long you are converting them the right way(as it seems according to the link) it should work...
For conceptual understanding this is a good starting point.
But you should understand the difference between row vs column major. And technically it could vary between compilers and languages depending upon what they are designed for.
http://en.wikipedia.org/wiki/Row-major_order

Resources