inserting an artificial column in mdx query - sql-server

from some reasons I need to insert an artificial(dummy) column into a mdx expression. (the reason is that i need to obtain a query with specific number of columns )
to ilustrate, this is my sample query:
SELECT {[Measures].[AFR],[Measures].[IB],[Measures].[IC All],[Measures].[IC_without_material],[Measures].[Nonconformance_PO],[Measures].[Nonconformance_GPT],[Measures].[PM_GPT_Weighted_Targets],[Measures].[PM_PO_Weighted_Targets], [Measures].[AVG_LC_Costs],[Measures].[AVG_MC_Costs]} ON COLUMNS,
([dim_ProductModel].[PLA].&[SME])
* ORDER( {([dim_ProductModel].[Warranty Group].children)} , ([Measures].[Nonconformance_GPT],[Dim_Date].[Date Full].&[2014-01-01]) ,desc)
* ([dim_ProductModel].[PLA Text].members - [dim_ProductModel].[PLA Text].[All])
* {[Dim_Date].[Date Full].&[2013-01-01]:[Dim_Date].[Date Full].&[2014-01-01]} ON ROWS
FROM [cub_dashboard_spares]
it is not very important, just some measures and crossjoined dimensions. Now I would need to add f.e. 2 extra columns, I don't care whether this would be a measure with null/0 values or another crossjoined dimension. Can I do this in some easy way without inserting any data into my cube?
In sql I can just write Select 0 or select "dummy1", but here it is not possible neither in ON ROWS nor in ON COLUMNS part of the query.
Thank you very much for your help,
Regards,
Peter
ps: so far I could just insert some measure more times, but I am interested whether there is a possibility to insert really "dummy" column

Your query just has the measures dimension on columns. The easiest way to extend it by some columns would be to repeat the last measure as many times that you get the correct number of columns.
Another possibility, which may be more efficient in case the last measure is complex to calculate would be to use
WITH member Measures.dummy as NULL
SELECT {[Measures].[AFR],[Measures].[IB],[Measures].[IC All],[Measures].[IC_without_material],[Measures].[Nonconformance_PO],[Measures].[Nonconformance_GPT],[Measures].[PM_GPT_Weighted_Targets],[Measures].[PM_PO_Weighted_Targets], [Measures].[AVG_LC_Costs],[Measures].[AVG_MC_Costs],
Measures.dummy, Measures.dummy, Measures.dummy
}
ON COLUMNS,
([dim_ProductModel].[PLA].&[SME])
* ORDER( {([dim_ProductModel].[Warranty Group].children)} , ([Measures].[Nonconformance_GPT],[Dim_Date].[Date Full].&[2014-01-01]) ,desc)
* ([dim_ProductModel].[PLA Text].members - [dim_ProductModel].[PLA Text].[All])
* {[Dim_Date].[Date Full].&[2013-01-01]:[Dim_Date].[Date Full].&[2014-01-01]}
ON ROWS
FROM [cub_dashboard_spares]
i. e. adding a dummy measure that should not need much computation as many times as you need it to the end of the columns.

Related

Expression to get a percentage

I have a report which has a transaction type as a row group. There are two different types. I want to get the percentage of one type 2 compared to type 1.
I am not sure how to do this, I assume I need to use an expression which states the name of the transaction type and then make a calculation based on the other type.
So Instead of a total for July being 300, I would like the percentage of SOP+ compared to SOP-, so in this case 1.96%. For clarity, the figures in SOP+ are not treated as negative.
When you design a query to be used in a report, it is generally easier to work with different types of values being in separate columns. You can let the report do most of the grouping and aggregation for you. In that case, the expression would be something like this:
=Fields!SOP_PLUS.Value / Fields!SOP_MINUS.Value
Since they are both in rows in the same column, you have to use some logic to separate them out into columns and then do the operation.
You'll need to add two calculated fields to your dataset. Use an expression like this to get the values:
=IIf(Fields!TYPE_CODE.Value = "SOP+", Fields!SOP.Value, Nothing)
In other words, you will have new columns that have just the plus and minus values with blanks in the other rows. Now you can use a similar expression to earlier to compare them.
=Max(Fields!SOP_PLUS.Value) / Max(Fields!SOP_MINUS.Value)
Keep in mind that the Max function applies to the current group scope. When you add in multiple row and column groups to the mix this can get more complicated. If that becomes an issue, I would suggest looking at rewriting the query to provide these values in separate rows to make the report design easier.
WITH table1([sop-], [sop+]) AS (
SELECT 306, -6
UNION ALL
SELECT 606, -14)
SELECT(CAST([sop+] AS DECIMAL(5, 2)) / CAST([sop-] AS DECIMAL(5, 2))) * 100.0 FROM table1;
Returns :
-1.960784000
-2.310231000

Is "offset-fetch and order by" ordering the whole table or partial table?

In SQL Server, if I try the following query:
select id from table
order by id
offset 1000000 ROWS
fetch next 1000000 ROWS ONLY;
How will SQL Server work? What strategy does SQL server use?
1. Do a sorting on the whole table first and then select the 1 million rows we need
2. Do a sorting on partial table and then return the 1 million rows we need.
I assume it is 2nd option. If so, how does SQL server decide which range of the table to be sorted?
Edit 1:
I am asking this question to understand what could cause the query slow. I am testing with two queries:
--Query 1:
select id from table
order by id
offset 1 ROWS
fetch next 1 ROWS ONLY;
and
--Query 2:
select id from table
order by id
offset 1000000000 ROWS
fetch next 1 ROWS ONLY;
I found the second query can take me about 30 minutes to finish while the first takes almost 0 second.
So I am curious on what causes this difference? If the two have same time used for order by (or does it even really do a sorting on the whole table? The id is the clustered indexed column of the table. I cannot imagine that it takes 0 second to finish sorting on a terabyte table.)
Then if the sorting takes same time, only difference would be the clustered-index scan. For first query, it only needs to scan first 1 or 10 (a small number) of rows. While for the second query, it needs to scan a much bigger number of rows ( >1000000000 ). But I am not quite sure if this is correct.
Thank you for your help!
Let me take a simple example..
order by id
offset 50 rows fetch 25 rows only
For the above query,the steps would be
1.Table should be sorted by id (if not pay penalty of sort,there is no partial sort,always a full sort)
2.Then scan 50+25 rows(paying cost of 75 rows) and return 25 rows only..
Below is an example of orders table i have(orderid is Pk,so sorted),you can see even though, we are getting only 20 rows ,you are paying cost of 120 rows...
Coming to your question,there is no partial sort (Which implies first option regarding sort only),even you try to return one row like below..
select top 1* from table
order by orderid

Excel quartile function with variable array criteria (like countif)

hoping someone can help with my Excel query.
I want to use the quartile function (or similar, could use percentile if that's easier). I have data in a column but I want to limit the data I use from that column.
I have job departments in column A, people's salaries in column B (and other data in the other columns e.g name).
I want to use my one main data list (c. 2,000 rows) to pick out the quartiles for the 10 or so depts I have but I don't want to have to make 10 specific lists to calculate the quartile of each dept.
Is there an option to use a countif or similar function so that I can have a drop down list of my 10 depts and depending on what dept I select my summary table will show the quartiles relevant for just that dept?
Thanks
Use an array formula =quartile(if(A1:A1000=C2,B1:B1000),.75) press control + shift + enter after entering the formula. Note: C2 = the department which quartile you are calculating.

Generating Working Hours using SQL Server Query

I have this data and I need to generate a query that will give the output below
You can do this kind of groupings of rows with 2 separate row_number()s. Have 1 for all the data, ordered by date and second one ordered by code and date. To get the groups separated from the data, use the difference between these 2 row_number()s. When it changes, then it's a new block of data. You can then use that number in group by and take the minimum / maximum dates for each of them.
For the final layout you can use pivot or sum + case, most likely you want to have a new row_number for getting the rows aligned properly. Depending if you can have data missing / not matching you'll need probably additional checks.

100k Rows Returned in a random order, without a SQL time out please

Ok,
I've been doing a lot of reading on returning a random row set last year, and the solution we came up with was
ORDER BY newid()
This is fine for <5k rows. But when we are getting >10-20k rows we are getting SQL time outs, the Execution planned tells me that 76% of my query cost comes from this line. and removing this line increase the speed by an order of magnitude when we have a large amount of rows.
Our users have a requirement of doing up to 100k rows at a time like this.
To give you all a bit more details.
We have a table with 2.6 million 4 digit alpha-numeric codes. We use a random set of these to gain entry into a venue. For example, if we have an event with a 5000 capacity, a random set of 5000 of these will be drawn from the table then issued to the each customer as a bar-code, then the bar-code scanning app at the door with have the same list of 5000. The reason for using a 4 digit alpha numeric code (and not a stupidly long number like a GUID) is that it easy for people to write the number down (or SMS it to a friend) and just bring the number and have it entered manually, so we don't want large amount of characters. Customers love the last bit btw.
Is there a better way than ORDER BY newid(), or is there a faster way to get 100k random rows from a table with 2.6 mil?
Oh, and we are using MS SQL 2005.
Thanks,
Jo
There is an MSDN article entitled "Selecting Rows Randomly from a Large Table" that talks about this exact problem and shows a solution (using no sorting but instead using a WHERE clause on a generated column to filter the rows).
The reason your query is slow is that the ORDER BY clause causes the whole table to be copied into tempdb for sorting.
If you want to generate random 4-digit codes, why not just generate them instead of trying to pull them out of a database?
Generate 100k unique numbers from 0 to 1,679,616 (which is the number of unique four-digit alphanumeric codes, ignoring case - 2.6 million rows must have some duplicates) and convert them to your four-digit codes.
You don't have to sort.
DECLARE #RandomNumber int
DECLARE #Threshold float
SELECT #RandomNumber = COUNT(*) FROM customers
SELECT #Threshold = 50000 / #RandomNumber
SELECT TOP 50000 * FROM customers WHERE rand() > #Threshold ORDER BY newid()
Just as a matter of interest, what is the performance like if you replace
ORDER BY newid()
by
ORDER BY CHECKSUM(newid())
One thought is to break down the process into steps. Add a column in the table for a GUID then do an update statement into the table adding the GUIDs. This can be done ahead of time if necessary. You should then be able to run the query with an orderby on the GUID column to recieve the results the same way.
Have you tried using % (modulo) on a given int column? Not sure what your table structure is, but you could do something like this:
select top 50000 *
from your_table
where CAST((CAST(ASCII(SUBSTRING(venuecode,1,1)) as varchar(3))+
CAST(ASCII(SUBSTRING(venuecode,2,1))as varchar(3))+
CAST(ASCII(SUBSTRING(venuecode,3,1))as varchar(3))+
CAST(ASCII(SUBSTRING(venuecode,4,1))as varchar(3))) as bigint) % 500000 between 0 and 50000
The above code will take all of your alpha numeric venues and convert them to an integer and then split the entire table into 500,000 buckets of which you are taking the top 50000 that fall between 0 and 50000. You can play with the number after the % since (500,000) and you can play with the between. This should randomize it for you. Not sure if the where clause will bite you on performance, but it's worth a shot. Also, without an order by, there is no guarantee of the order (if you have multiple cpus and threading).

Resources