I have a concern about data organisation and the best approach to simplify some multi-layered data. Simply, I have a 10 replicates of small wood beams (BeamID, ~10) subjected to a 10 different treatment (TreatID, ~10), and each beam is load tested which produces a series data of a Load with consequent Displacement (ranging from 10 to 50 rows per test; I have code that corrects for disparities in row length). Each wood beam is tested multiple times (Rep, ~10).
My plan was to lump all this data into a 5-D array:
Array[Load, Deflection, BeamID, TreatID, Rep]
This way, I should be able to plot the load~deflection curves for a given BeamID, TreatID, for all Reps by using Array[ , ,1,1, ], right? So the hypothetical output for Array[ , ,1,1,1], would be:
+------------+--------+-----+
| Deflection | Load | Rep |
+------------+--------+-----+
| 0 | 0 | 1 |
| 6.35 | 10.5 | 1 |
| 12.7 | 20.8 | 1 |
| 19.05 | 45.3 | 1 |
| 25.4 | 75.2 | 1 |
+------------+--------+-----+
And Array[ , ,1,1,2] would be:
+------------+--------+-----+
| Deflection | Load | Rep |
+------------+--------+-----+
| 0 | 0 | 2 |
| 7.3025 | 12.075 | 2 |
| 14.605 | 23.92 | 2 |
| 21.9075 | 52.095 | 2 |
| 29.21 | 86.48 | 2 |
+------------+--------+-----+
Or I think I could keep it as a simpler, 'melted' dataframe, which would have columns for Load and Deflection, and BeamID, TreatID, and Rep would be repeated for each row of the test output.
+------------+--------+-----+--------+---------+
| Deflection | Load | Rep | BeamID | TreatID |
+------------+--------+-----+--------+---------+
| 0 | 0 | 1 | 1 | 1 |
| 6.35 | 10.5 | 1 | 1 | 1 |
| 12.7 | 20.8 | 1 | 1 | 1 |
| 19.05 | 45.3 | 1 | 1 | 1 |
| 25.4 | 75.2 | 1 | 1 | 1 |
| 0 | 0 | 2 | 1 | 1 |
| 7.3025 | 12.075 | 2 | 1 | 1 |
| 14.605 | 23.92 | 2 | 1 | 1 |
| 21.9075 | 52.095 | 2 | 1 | 1 |
| 29.21 | 86.48 | 2 | 1 | 1 |
+------------+--------+-----+--------+---------+
However, with the latter, I'm not sure how I could easily and discretely pull out all the Rep test values for a specific BeamID and TreatID, especially since I use a linear model to fit a 3rd order polynomial for an specific test to extract the slope of the curves. Having it as a continuous dataframe means I'd have to specify starting and stopping points to start the linear model, correct?
Thoughts, suggestions? Am I headed in the right direction in using a 5-D array? R is a new programming language for me, so please pardon my misunderstandings.
I have a table with a number of variables such as:
+-----------+------------+---------+-----------+--------+
| DateFrom | DateTo | Price | Discount | Cost |
+-----------+------------+---------+-----------+--------+
| 01jan17 | 01jul17 | 17 | 4 | 5 |
| 01aug17 | 01feb18 | 15 | 1 | 3 |
| 01mar18 | 01dec18 | 12 | 2 | 1 |
| ... | ... | ... | ... | ... |
+-----------+------------+---------+-----------+--------+
However I want to split this so I have:
+------------+------------+----------+-------------+---------+-------------+------------+----------+-------------+-------------+
| DateFrom1 | DateTo1 | Price1 | Discount1 | Cost1 | DateFrom2 | DateTo2 | Price2 | Discount2 | Cost2 ... |
+------------+------------+----------+-------------+---------+-------------+------------+----------+-------------+-------------+
| 01jan17 | 01jul17 | 17 | 4 | 5 | 01aug17 | 01feb18 | 15 | 1 | 3 |
+------------+------------+----------+-------------+---------+-------------+------------+----------+-------------+-------------+
There's a cool (not at all obvious) solution using proc summary and the idgroup statement that only takes a few lines of code. This runs in memory and you're likely to come into problems if the dataset is large, otherwise this works very well.
Note that out[3] relates to the number of rows in the source data. You could easily make this dynamic by adding a prior step that calculates the number of rows and stores it in a macro variable.
/* create initial dataset */
data have;
input (DateFrom DateTo) (:date7.) Price Discount Cost;
format DateFrom DateTo date7.;
datalines;
01jan17 01jul17 17 4 5
01aug17 01feb18 15 1 3
01mar18 01dec18 12 2 1
;
run;
/* transform data into 1 row */
proc summary data=have nway;
output out=want (drop=_:)
idgroup(out[3] (_all_)=) / autoname;
run;
I have a table that keeps inventory information for products in stores on daily basis. It is like:
|------------|-----------|---------|-----------------|
| Date | ProductId | StoreId | InventoryOnHand |
|------------|-----------|---------|-----------------|
| 2017-10-11 | 348 | 121 | 2 |
| 2017-10-11 | 110 | 200 | 0 |
| 2017-10-11 | 254 | 587 | -2 |
| 2017-10-12 | 311 | 875 | 26 |
| 2017-10-12 | 954 | 364 | 15 |
| 2017-10-12 | 348 | 121 | 0 |
| 2017-10-12 | 441 | 121 | 7 |
| . | . | . | . |
| . | . | . | . |
| . | . | . | . |
|------------|-----------|---------|-----------------|
In most queries I used have condition like WHERE InventoryOnHand > 0. I need to speed up these queries.
Therefore, I want to build and index that separates values on column InventoryOnHand whether they are greater than 0 or not.
Filtered Index does not solve my problem because if I use filtered index all values greater than 0 will be indexed and this increases index size. I only need to know if a value greater than 0 or not.
i.e. I want to build an index that only works when condition is InventoryOnHand > 0. Is there any way to do this on SQL-Server?
I need to make 2 database constraints that connect two different tables at one time.
1. The total score of the four quarters equals the total score of the game the quarters belong to.
2. The total point of all the players equals to the score of the game of that team.
Here is what my tables look like.
quarter table
+------+--------+--------+--------+
| gNum | Period | hScore | aScore |
+------+--------+--------+--------+
| 1 | 1 | 13 | 18 |
| 1 | 2 | 12 | 19 |
| 1 | 3 | 23 | 31 |
| 1 | 4 | 32 | 18 |
| | | Total | Total |
| | | 80 | 86 |
+------+--------+--------+--------+
Game Table
+-----+--------+--------+--------+
| gID | hScore | lScore | tScore |
+-----+--------+--------+--------+
| 1 | 86 | 80 | 166 |
+-----+--------+--------+--------+
Player Table
+-----+------+--------+--------+
| pID | gNum | Period | Points |
+-----+------+--------+--------+
| 1 | 1 | 1 | 20 |
| | | 2 | 20 |
| | | 3 | 20 |
| | | 4 | 20 |
+-----+------+--------+--------+
So Virtually I need to use CHECK I think to make sure that players points = score of their team ie (hScore, aScore) and also make sure that the hScore and aScore = the total score in the Game table.
I was thinking of creating a foreign key variable on one of the tables and setting up constraints on that would this be the best way of going about it?
Thanks
I have a table in Access database as below;
Name | Range | X | Y | Z
------------------------------
A | 100-200 | 1 | 2 | 3
A | 200-300 | 4 | 5 | 6
B | 100-200 | 10 | 11 | 12
B | 200-300 | 13 | 14 | 15
C | 200-300 | 16 | 17 | 18
C | 300-400 | 19 | 20 | 21
I have trying write a query that convert this into the following format.
Name | X_100_200 | Y_100_200 | Z_100_200 | X_200_300 | Y_200_300 | Z_200_300 | X_300_400 | Y_300_400 | Z_300_400
A | 1 | 2 | 3 | 4 | 5 | 6 | | |
B | 10 | 11 | 12 | 13 | 14 | 15 | | |
C | | | | 16 | 17 | 18 | 19 | 20 | 21
After trying for a while the best method I could come-up with is to write bunch of short queries that selects the data for each Range and then put them together again using a Union query. The problem is that for this example I have shown 3 columns (X, Y and Z), but I actually have much more. Access is starting to strain with the amount of SQL I have come up with.
Is there a better way to achieve this?
The answer was simple. Just use Access Pivotview. Finding it hard to export the results to Excel though.