I have a table with 200 columns (maybe more...)
a1 a2 a3 a4 a5 ...a200
---------------------------------
1.2 2.3 4.4 5.1 6.7... 11.9
7.2 2.3 4.3 5.1 4.7... 3.9
1.9 5.3 3.3 5.1 3.7... 8.9
5.2 2.7 7.4 9.1 1.7... 2.9
I would like to compute many operations:
SUM(every column)
AVG(every column)
SQRT(SUM(every column))
POWER(SUM(every column),2)
MIN(all columns)
MAX(all columns)
GREATEST(SUM(one column) vs SUM(other column))
something like finding wich sum is greatest for every column:
a1 vs a2, a1 vs a3, a1 vs a4....,a1 vs a200,
a2 vs a1, a2 vs a3, a4 vs a5....,a2 vs a200,
...
a200 vs a1, a200vs a2, a200vs a3.....a200 vs a199
If I do a single select statement for each column,and for each operation I'd have:
SELECT
SUM(a1),...,SUM(a200),
AVG(Sum(a1)),...,AVG(Sum(a200)),
POWER(Sum(a1),2),...,POWER(Sum(a200),2),
GREATEST(SUM(a1),SUM(a2)), GREATEST(SUM(a1),SUM(a3)),...,GREATEST(SUM(a1),SUM(a200)),
GREATEST(SUM(a2),SUM(a1)), GREATEST(SUM(a2),SUM(a3)),...,GREATEST(SUM(a2),SUM(a200))....
GREATEST(SUM(a200),SUM(a1)), GREATEST(SUM(a200),SUM(a3)),...,GREATEST(SUM(a200),SUM(a199))
etc... FROM tabMultipleColumns
The problem here is when I do a query with more than 1024 possible results
aka, >= 1024 columns
Is there a way to keep doing massive operations with data doing a single scan of the table, I mean avoiding doing multiple selects statements?
I am trying to use only a scan, because if the table is huge (with size
of many GB's) using many selects statements to scan the same table would be expensive...
Can a tool like BCP be used or what solution do you think is more efficient...
if you look only for the SUM, POWER(SUM(),2) and SQRT(SUM()), there are 600 result columns... if I keep doing this operations there are more than 1024...
That's a lot of calculations. I would probably just do a periodic dump of them into another table to minimize server load. It depends on how often the query is going to be used though.
Related
I am stuck on modelling a tiny example in Microsoft SQL Server Analysis Services. I do consider myself advanced in this technology, but this problem I cannot get my head around.
It comes down to a measure group which is restricted by three dimensions. Two dimensions behave exactly as expected, limiting the rows delivered back.
The third dimension does not have any effect on this measure group. For each entry on this third dimension, the same value (the total) is delivered as measure. The only noticeable difference between the dimensions is, the ones that do work link through their key element, while the one that does not does link through a non key attribute.
The item names are in German language, hope the following translation helps
Benutzer -> User
Kunde -> Customer
Bestellung -> Order
I hope the following screen shots help to show the relevant modelling details:
Dim Bestellung (the one that is not working)
Dimension Usage in Cube
MDX Query that is working
If I do query using the Kunde ("customer") dimension which is linked on key, everything behaves as expected. For the customers with 1s and 3s in their "name", there is a result, and null for the others.
SELECT {
[Measures].[Anzahl SecKunde]
} ON COLUMNS
, {
[DIM Kunde].[Hierarchie Kunde].[Kunde]
} ON ROWS
FROM
[BiEvaluation]
WHERE
[DIM Benutzer].[Hierarchie Benutzer].[Benutzer].&[Domain\User1]
MDX Query not working
If I however query by Bestellung ("Order"), I get the total of 2 for each order. The orders B1, B2, B5 and B6 are from customers 1 and 3, so I'd like to see a count of 1 there!
SELECT {
[Measures].[Anzahl SecKunde]
} ON COLUMNS
, {
[DIM Bestellung].[Hierarchie Bestellung].[Bestellung].AllMembers
} ON ROWS
FROM
[BiEvaluation]
WHERE
[DIM Benutzer].[Hierarchie Benutzer].[Benutzer].&[Domain\User1]
MDX Query not working "fixed"
I found a way to make it work, yet... I do not understand why. The "trick" is to include the non-key-attribute the dimension-fact link is on. This gives the result I consider correct, a 1 for B1, B2, B5 and B6 (because those are done by customers 1 and 3)
SELECT {
[Measures].[Anzahl SecKunde]
} ON COLUMNS
, {
[DIM Bestellung].[Hierarchie Bestellung].[Bestellung].AllMembers
* [DIM Bestellung].[ID Kunde].[ID Kunde].AllMembers -- Only change needed
} ON ROWS
FROM
[BiEvaluation]
WHERE
[DIM Benutzer].[Hierarchie Benutzer].[Benutzer].&[Domain\User1]
Sorry for this awful long post. If anybody can share any idea, why this is happening I'd appreciate it a lot.
I could most likely work with including the attribute, but this is a huge source for problems and errors. I do not understand why having it there makes this difference.
I think it is not relevant to the question, but I'll include the versions of the tools anyway:
Visual Studio 2017 15.9.41
SQL Server Analysis Services-Designer 15.0.19623.0
SQL Server 2017 (SSDB) 14.0.3370.1 (CU22 + Security Update)
SQL Server 2017 (SSAS) 14.0.249.62 (CU22)
I have the below array to detect 3 different criteria and return multiple results from a data source with 10000 Rows.
{=IF(INDEX(Inventory!$A$3:$Q$10000;SMALL(IF(($C$4=Inventory!$A$3:$A$10000)*($C$3=Inventory!$E$3:$E$10000)*(Inventory!$F$3:$F$10000="NEW");ROW(Inventory!$A$3:$A$10000)-ROW($C$3)+2);ROW(Inventory!1:1));16)=0;"";INDEX(Inventory!$A$3:$Q$10000;SMALL(IF(($C$4=Inventory!$A$3:$A$10000)*($C$3=Inventory!$E$3:$E$10000)*(Inventory!$F$3:$F$10000="NEW");ROW(Inventory!$A$3:$A$10000)-ROW($C$3)+2);ROW(Inventory!1:1));16))}
The Inventory table goes like this
A |E |F |P
Standard Laptop |Lisbon |NEW |XCVBMT
Engineering Laptop |London |DAMAGED |CVFTYU
Multiple Vendor |Madrid |QUARANTINE |CVBLPU
Standard Laptop |Lisbon |NEW |JKHGLK
I Have A and E criteria to select from drop down lists in C3 and C4.
If I delete the below criteria from the array, it works:
($C$4=Inventory!$A$3:$A$10000)
I cleared all formats, changed rows, changed the criteria to D4 and tiped in manually, trimmed... I think is right in front of me but have no clue on whats wrong.
I hope this is enough information.
Thanks
I have a datasheet that is basically column A has company names (A thru Z for simplicity) while columns B through F have financial information (stock amounts, retained earnings, ect).
(row 1) Company Stock Dividends Net Income Retained Earnings
(row 2) A 5.4 7.6 44.5 57.5
(row 3) B 8.2 8.4 78.6 88.9
(row 4) C 13.4 2.2 14.4 14.5
(row 5) D 4.7 5.4 8.9 16.7
...
(row 27)Z 5.6 8.4 12.5 11.1
(row 29)Sum of the following companies:
(row 30)A Stock
(row 31)C Dividends
(row 32)D Net Income
(row 33)Z Retained Earnings
I am trying to return the sum of multiple companies from a data table based on the column name. For Instance, I would want to find the total stock amount for company A,,C,D & Z (this will change frequently, so I wanted to find a non-hard coded method like typing the values into a {}). I would want the formula to reference a cell with the names of the columns as well since the actual data table has about 15 different column variables.
What I have tried to do so far is to incorporate an array into an index match match like below into the cell to the left of "stock" in row 30:
+INDEX($B$2:$F$27,MATCH($A$30:$A$33,$A$2:$A$27,0),MATCH($B30,$B$1:$F$1,0))
However, since I don't think I can use an array for a lookup value in the match formula, I come out with #N/A.
Does anyone have any suggestions as to what I can do?
I know that you asked about a formula, but wonder if a PivotTable and slicer(s) might not do what you want with minimal effort. Here, for example, it displays the total financial amounts, including stocks, for companies A,C,D and Z. BTW it took less time to produce the PivotTable and slicer than it has taken to type this answer.
I have data that looks like this in Excel (like a Gant Chart):
Column A Column B Column C
Step 1 Days to Complete Requires Step First
row 1 1.1 2
row 2 1.2 1
row 3 1.3 1 1.1
row 4 1.4 0
row 5 1.5 1 1.1
You can start steps 1.1, 1.2, and 1.4 right away, but you have to complete step 1.1 before you can start steps 1.3 or 1.5. What can I use to get the TOTAL DAYS TO COMPLETE = 3 result in a cell somewhere in my spreadsheet? I've tried sumif but my column A and column C values do not match adjacent to each other in each row. Plus, if it says step 1.1 in column C twice I only want it to add the value in row 1 column B one time.
Thanks in advance!
If you can use two columns I would do this, first check column C with an if and if it is blank give a cell the value of column B:
=IF(C1 = "", B1, 0)
And then when you have applied this to match with your data you can sum that column in a cell by using:
=SUM(D1:D10)
I have SQL 2008 (not R2). I would like to have a matrix report where user can select one of the SQL resultset columns to be the matrix column group..
For example
A B Value
a1 b1 10
a2 b2 20
a3 b2 30
So the possible matrices could be (user selects from dropdown with A, B).
By A
a1 a2 a3
Value 10 20 30
By B
b1 b2
Value 10 50
This question should solve your problem. It is a way to use a parameter to refer to a field in your dataset.
i havne't actually done anything like this before, but i have a theory that you can modify the expression on the group to have an iif statement to change which field is grouped on
so, for the column group, change the expression for the gorup to something like =iif(ParameterValue=1,Field1,Field2)