I'm querying INFORMATION_SCHEMA.COLUMNS and for MATERIALIZED VIEWS with columns having an AVG() aggregation I do always get two rows for one existing column with COLUMN_NAME like this:
PROFIT_AVG$SYS_FACADE$0
PROFIT_AVG$SYS_FACADE$1
First of all, what does SYS_FACADE mean?
Second, how can I filter the duplicate row since they have different values for precision/scale, and which would it be SYS_FACADE$1?
Thanks for any suggestions!
Number datatype in snowflake has scale '0' (digits to right of decimal point) while float will hold valid higher scale value in case of average. It's snowflakes internal way of handling average calculation.
2.You can see that in the 'MV definition' in the UI , it will autogenerate 2 columns (comment as Internal aggregate).
3.why are you looking to filter the columns.
Related
I have a report which has a transaction type as a row group. There are two different types. I want to get the percentage of one type 2 compared to type 1.
I am not sure how to do this, I assume I need to use an expression which states the name of the transaction type and then make a calculation based on the other type.
So Instead of a total for July being 300, I would like the percentage of SOP+ compared to SOP-, so in this case 1.96%. For clarity, the figures in SOP+ are not treated as negative.
When you design a query to be used in a report, it is generally easier to work with different types of values being in separate columns. You can let the report do most of the grouping and aggregation for you. In that case, the expression would be something like this:
=Fields!SOP_PLUS.Value / Fields!SOP_MINUS.Value
Since they are both in rows in the same column, you have to use some logic to separate them out into columns and then do the operation.
You'll need to add two calculated fields to your dataset. Use an expression like this to get the values:
=IIf(Fields!TYPE_CODE.Value = "SOP+", Fields!SOP.Value, Nothing)
In other words, you will have new columns that have just the plus and minus values with blanks in the other rows. Now you can use a similar expression to earlier to compare them.
=Max(Fields!SOP_PLUS.Value) / Max(Fields!SOP_MINUS.Value)
Keep in mind that the Max function applies to the current group scope. When you add in multiple row and column groups to the mix this can get more complicated. If that becomes an issue, I would suggest looking at rewriting the query to provide these values in separate rows to make the report design easier.
WITH table1([sop-], [sop+]) AS (
SELECT 306, -6
UNION ALL
SELECT 606, -14)
SELECT(CAST([sop+] AS DECIMAL(5, 2)) / CAST([sop-] AS DECIMAL(5, 2))) * 100.0 FROM table1;
Returns :
-1.960784000
-2.310231000
from some reasons I need to insert an artificial(dummy) column into a mdx expression. (the reason is that i need to obtain a query with specific number of columns )
to ilustrate, this is my sample query:
SELECT {[Measures].[AFR],[Measures].[IB],[Measures].[IC All],[Measures].[IC_without_material],[Measures].[Nonconformance_PO],[Measures].[Nonconformance_GPT],[Measures].[PM_GPT_Weighted_Targets],[Measures].[PM_PO_Weighted_Targets], [Measures].[AVG_LC_Costs],[Measures].[AVG_MC_Costs]} ON COLUMNS,
([dim_ProductModel].[PLA].&[SME])
* ORDER( {([dim_ProductModel].[Warranty Group].children)} , ([Measures].[Nonconformance_GPT],[Dim_Date].[Date Full].&[2014-01-01]) ,desc)
* ([dim_ProductModel].[PLA Text].members - [dim_ProductModel].[PLA Text].[All])
* {[Dim_Date].[Date Full].&[2013-01-01]:[Dim_Date].[Date Full].&[2014-01-01]} ON ROWS
FROM [cub_dashboard_spares]
it is not very important, just some measures and crossjoined dimensions. Now I would need to add f.e. 2 extra columns, I don't care whether this would be a measure with null/0 values or another crossjoined dimension. Can I do this in some easy way without inserting any data into my cube?
In sql I can just write Select 0 or select "dummy1", but here it is not possible neither in ON ROWS nor in ON COLUMNS part of the query.
Thank you very much for your help,
Regards,
Peter
ps: so far I could just insert some measure more times, but I am interested whether there is a possibility to insert really "dummy" column
Your query just has the measures dimension on columns. The easiest way to extend it by some columns would be to repeat the last measure as many times that you get the correct number of columns.
Another possibility, which may be more efficient in case the last measure is complex to calculate would be to use
WITH member Measures.dummy as NULL
SELECT {[Measures].[AFR],[Measures].[IB],[Measures].[IC All],[Measures].[IC_without_material],[Measures].[Nonconformance_PO],[Measures].[Nonconformance_GPT],[Measures].[PM_GPT_Weighted_Targets],[Measures].[PM_PO_Weighted_Targets], [Measures].[AVG_LC_Costs],[Measures].[AVG_MC_Costs],
Measures.dummy, Measures.dummy, Measures.dummy
}
ON COLUMNS,
([dim_ProductModel].[PLA].&[SME])
* ORDER( {([dim_ProductModel].[Warranty Group].children)} , ([Measures].[Nonconformance_GPT],[Dim_Date].[Date Full].&[2014-01-01]) ,desc)
* ([dim_ProductModel].[PLA Text].members - [dim_ProductModel].[PLA Text].[All])
* {[Dim_Date].[Date Full].&[2013-01-01]:[Dim_Date].[Date Full].&[2014-01-01]}
ON ROWS
FROM [cub_dashboard_spares]
i. e. adding a dummy measure that should not need much computation as many times as you need it to the end of the columns.
I've been reading about data demographics of teradata and came across with this two terms. It is mentioned that this two goes hand in hand to make good index choice, but I can't seem to understand exactly what is the difference between the two values.
Can anyone explain to me the exact difference between the two. Examples on how the values are derived would be really helpful.
I'm thinking both values will come from this query:
sel <columnname>, count(*)
from <tablename>
Here are the definition of the two terms, btw.
Maximum Rows/Value –No. of rows for the most-often-occurring value in the column.
Typical Rows/Value –No. of rows for a typical value in the column.
Any inputs will be much appreciated.
Thank you.
Here is my understanding of Maximum Rows/Value vs Typical Rows/Value.
Suppose (SQL Fiddle Link: http://sqlfiddle.com/#!4/27641/13/0)
SELECT MAX (COUNT ("sometext")) max_row_per_value
FROM table1
GROUP BY id
And here is the result
MAX_ROW_PER_VALUE
7
In this case, when you look at id=1, there are 7 records for that value, being the maximum rows/value.
The typical rows/value is what I consider the AVG(), like this:
SELECT AVG (COUNT ("sometext")) typical_row_per_value
FROM table1
GROUP BY id
Result
TYPICAL_ROW_PER_VALUE
4.5
We are trying to speed up some of our stored procs by reducing implicit conversions. One of the issues we are trying to figure out is how to fix several indexed views similar to this:
Time.UserID is INT
Time.TimeUnit is DECIMAL(9,2)
Time.BillingRate is MONEY
Select
UserID,
SUM(TimeUnit) as Hours,
SUM(TimeUnit*BillingRate) as BillableDollars
FROM
Time
GROUP BY
UserID
gives us a view with the columns:
UserID(int, null)
Hours(decimal(38,2), null)
BillableDollars(decimal(38,6), null)
We would prefer to have Hours(decimal(9,2),null) and BillableDollars(money,null).
CAST(SUM(TimeUnit*BillingRate) AS MONEY) as BillableDollars
Returned:
Cannot create the clustered index 'ix_indexName' on
view 'x.dbo.vw_viewName' because the
select list of the view contains an expression on result of aggregate
function or grouping column. Consider removing expression on result of
aggregate function or grouping column from select list.
And we were worried about the efficiency of SUM(CAST(TimeUnit*BillingRate AS MONEY)) as BillableDollars
What would be the best way to preserve these column types or is there a 'best practice'?
I might try adding a derived (actualized) "BillableDollars" column to the Time table with the conversion applied:
CONVERT(MONEY,(TimeUnit*BillingRate))
I've used money but, of course, the conversion could be to whatever data type most effectively meets your needs.
I believe this will allow you to have the indexed view while summing on the calculated billable dollars.
I expect that the reason you're getting a larger precision in the view datatype than the one in the table data type is that the sum of a bunch of numbers with a precision of 9 is adding up to a number that needs a precision greater than 9.
Just wrap your indexed view in second "casting" view like this
CREATE VIEW MyView
AS
SELECT CAST(UserID AS INT) AS UserID
, CAST(TimeUnit AS DECIMAL(9,2)) AS TimeUnit
, CAST(BillingRate AS MONEY) AS BillingRate
FROM MyViewIndexed WITH (NOEXPAND)
As a bonus you can include NOEXPAND hint so the underlying indexed view is actually utilized by query optimizer on less "advanced" editions of MSSQL.
Ok,
I've been doing a lot of reading on returning a random row set last year, and the solution we came up with was
ORDER BY newid()
This is fine for <5k rows. But when we are getting >10-20k rows we are getting SQL time outs, the Execution planned tells me that 76% of my query cost comes from this line. and removing this line increase the speed by an order of magnitude when we have a large amount of rows.
Our users have a requirement of doing up to 100k rows at a time like this.
To give you all a bit more details.
We have a table with 2.6 million 4 digit alpha-numeric codes. We use a random set of these to gain entry into a venue. For example, if we have an event with a 5000 capacity, a random set of 5000 of these will be drawn from the table then issued to the each customer as a bar-code, then the bar-code scanning app at the door with have the same list of 5000. The reason for using a 4 digit alpha numeric code (and not a stupidly long number like a GUID) is that it easy for people to write the number down (or SMS it to a friend) and just bring the number and have it entered manually, so we don't want large amount of characters. Customers love the last bit btw.
Is there a better way than ORDER BY newid(), or is there a faster way to get 100k random rows from a table with 2.6 mil?
Oh, and we are using MS SQL 2005.
Thanks,
Jo
There is an MSDN article entitled "Selecting Rows Randomly from a Large Table" that talks about this exact problem and shows a solution (using no sorting but instead using a WHERE clause on a generated column to filter the rows).
The reason your query is slow is that the ORDER BY clause causes the whole table to be copied into tempdb for sorting.
If you want to generate random 4-digit codes, why not just generate them instead of trying to pull them out of a database?
Generate 100k unique numbers from 0 to 1,679,616 (which is the number of unique four-digit alphanumeric codes, ignoring case - 2.6 million rows must have some duplicates) and convert them to your four-digit codes.
You don't have to sort.
DECLARE #RandomNumber int
DECLARE #Threshold float
SELECT #RandomNumber = COUNT(*) FROM customers
SELECT #Threshold = 50000 / #RandomNumber
SELECT TOP 50000 * FROM customers WHERE rand() > #Threshold ORDER BY newid()
Just as a matter of interest, what is the performance like if you replace
ORDER BY newid()
by
ORDER BY CHECKSUM(newid())
One thought is to break down the process into steps. Add a column in the table for a GUID then do an update statement into the table adding the GUIDs. This can be done ahead of time if necessary. You should then be able to run the query with an orderby on the GUID column to recieve the results the same way.
Have you tried using % (modulo) on a given int column? Not sure what your table structure is, but you could do something like this:
select top 50000 *
from your_table
where CAST((CAST(ASCII(SUBSTRING(venuecode,1,1)) as varchar(3))+
CAST(ASCII(SUBSTRING(venuecode,2,1))as varchar(3))+
CAST(ASCII(SUBSTRING(venuecode,3,1))as varchar(3))+
CAST(ASCII(SUBSTRING(venuecode,4,1))as varchar(3))) as bigint) % 500000 between 0 and 50000
The above code will take all of your alpha numeric venues and convert them to an integer and then split the entire table into 500,000 buckets of which you are taking the top 50000 that fall between 0 and 50000. You can play with the number after the % since (500,000) and you can play with the between. This should randomize it for you. Not sure if the where clause will bite you on performance, but it's worth a shot. Also, without an order by, there is no guarantee of the order (if you have multiple cpus and threading).