ON COLUMNS compared to ON ROWS - sql-server

Why in MDX is it ok to do the following:
SELECT
[MyDim].[MyHier].[MyLevel] ON COLUMNS
FROM [CubeName]
But not the following:
SELECT
[MyDim].[MyHier].[MyLevel] ON ROWS
FROM [CubeName]

I've never find any good reason for that ;-) on columns, on rows are defining the 'shape' of the result (the tuples being exactly the same) and it has been decided that on columns only is ok but not on rows only; i.e.:
a | b | c
1 | 2 | 3
but not:
a | 1
b | 2
c | 3
weird as I see no problem to return those values from an MDX server implementation point of view. By the way, no columns and no rows is valid :
select from [cube]

It seems that there is a hierarchy of axises. Columns is the first and minimal part in defining a Tuple:
When you specify an axis for a set (in this case composed of a single tuple) in a query, you must begin by specifying a set for the column axis before specifying a set for the row axis. The column axis can also be referred to as axis(0) or simply 0.
So you should first define Columns, then Rows, then Pages and so on.
<SELECT query axis clause> ::=
[ NON EMPTY ] Set_Expression
[ <SELECT dimension property list clause> ]
ON
Integer_Expression
| AXIS(Integer)
| COLUMNS
| ROWS
| PAGES
| SECTIONS
| CHAPTERS

Related

Does taking advantage of dynamic columns in Cassandra require duplicated data in each row?

I've been trying to understand how one would model time series data in Cassandra, like shown in the below image from a popular System Design Interview video, where counts of views are stored hourly.
While I would think the schema for this time series data would be something like the below, I don't believe this would lead to data actually being stored in the way the screenshot shows.
CREATE table views_data {
video_id uuid
channel_name varchar
video_name varchar
viewed_at timestamp
count int
PRIMARY_KEY (video_id, viewed_at)
};
Instead, I'm assuming it would lead to something like this (inspired by datastax), where technically there is a single row for each video_id, but the other columns seem like they would all be duplicated, such as channel_name, video_name, etc.. within the row for each unique viewed_at.
[cassandra-cli]
list views_data;
RowKey: A
=> (channel_name='System Design Interview', video_name='Distributed Cache', count=2, viewed_at=1370463146717000)
=> (channel_name='System Design Interview', video_name='Distributed Cache', count=3, viewed_at=1370463282090000)
=> (channel_name='System Design Interview', video_name='Distributed Cache', count=8, viewed_at=1370463282093000)
-------------------
RowKey: B
=> (channel_name='Some other channel', video_name='Some video', count=4, viewed_at=1370463282093000)
I assume this is still considered dynamic wide row, as we're able to expand the row for each unique (video_id, viewed_at) combination. But it seems less than ideal that we need to duplicate the extra information such as channel_name and video_name.
Is the screenshot of modeling time series data misleading or is it actually possible to have dynamic columns where certain columns in the row do not need to be duplicated?
If I was upserting time series data to this row, I wouldn't want to have to provide the channel_name and video_name for every single upsert, I would just want to provide the count.
No, it is not necessary to duplicate the values of columns within the rows of a partition. It is possible to model your table to accomodate your use case.
In Cassandra, there is a concept of "static columns" -- columns which have the same value for all rows within a partition.
Here's the schema of an example table that contains two static columns, colour and item:
CREATE TABLE statictbl (
pk int,
ck text,
c int,
colour text static,
item text static,
PRIMARY KEY (pk, ck)
)
In this table, each partition share the same colour and item for all rows of the same partition. For example, partition pk=1 has the same colour='red' and item='apple' for all rows:
pk | ck | colour | item | c
----+----+--------+--------+----
1 | a | red | apple | 12
1 | b | red | apple | 23
1 | c | red | apple | 34
If I insert a new partition pk=2:
INSERT INTO statictbl (pk, ck, colour, item, c) VALUES (2, 'd', 'yellow', 'banana', 45)
we get:
pk | ck | colour | item | c
----+----+--------+--------+----
2 | d | yellow | banana | 45
If I then insert another row withOUT specifying a colour and item:
INSERT INTO statictbl (pk, ck, c) VALUES (2, 'e', 56)
the new row with ck='e' still has the colour and item populated even though I didn't insert a value for them:
pk | ck | colour | item | c
----+----+--------+--------+----
2 | d | yellow | banana | 45
2 | e | yellow | banana | 56
In your case, both the channel and video names will share the same value for all rows in a given partition if you declare them as static and you only ever need to insert them once. Note that when you update the value of static columns, ALL the rows for that partition will reflect the updated value.
For details, see Sharing a static column in Cassandra. Cheers!

Report Builder Multiple Group Matrix Calculation

I have a Matrix created using Matrix Wizard in Report Builder 3.0(2014), having 2 row groups ,1 column groups and 2 values. After I create the matrix (included total and subtotal), I have a matrix that look just nice. But now I want to add one more cell for each columns groups (one row), to store the below value.
Value = Total of 1st row group + Total of 2nd row group - Total of 3rd row group ...
The matrix built just show me the subtotal of each row group which I don't need.
I want to ask how do I retrieve the result of total calculated by matrix itself and how do I identify them based on their row group value using expression? And also, how do I do this for every column groups which have different data?
I tried to look at the expression in design view of the matrix, it just shows [SUM(MyField)] for every cell in the Matrix (total & subtotal).
Or should I do it at another dataset using another query? If so, what query should I use and how do I put two dataset into one matrix?
My Matrix looks something like this :
Column Group
ROW GROUP 1 | ROW GROUP 2 | VALUE 1 | VALUE 2
Row Group 1 | Row Group 2 | [Sum(MyField)] | [Sum(MyField)]
| TOTAL OF ROW GROUP 1 | [Value] | [Value]
| ROW PLAN TO ADD | [Value(0)+Value | [Value(0)+Value
| (1)-Value(2)] | (1)-Value(2)]
CAPITALS : Column name, constant
[sqrbrkted] : Calculated Value
Normal : data inside table
I am new to Report Builder, sorry if I made any mistake. In case I didn't make myself clear, please do comment and let me know. Thank you in advance.
EDIT: I have figured out an approach to achieve my purpose at the answer section below. If anyone have other solution, please feel free to answer it. Thanks.
I solved this problem by changing my SQL query to make another dummy column which shows negative result if its respective row group is meant to deduct the subtotal (Value(2)), and place it inside the matrix. And inside the expression, I have another IIF statement to convert it back to positive for display purpose.
SQL:
SELECT *, (CASE WHEN COL_B = 'VAL_2' THEN -COL_A ELSE COL_A END)AS DMY_COL_A FROM TABLE
Expression:
=IIF(Sum(Fields!DMY_COL_A.Value)<0,-Sum(Fields!DMY_COL_A.Value),Sum(Fields!DMY_COL_A.Value))

Group By with Where clause in SQL

I got some difference in the results when using a DateTime column in groupby. Can someone explain why?
Query:
Select Name, Source, Description, CreatedDate
From testTable
Where Source like '%Validating err%'
And CreatedDate >='2016-12-01'
Group By Name , Source, Description, CreatedDate
Result : 15 rows
The above query return me some 15 results. But when i remove the CreatedDate column from groupby clause it returns only 4 results.
Query:
Select Name, Source, Description
From testTable
Where Source like '%Validating err%'
And CreatedDate >='2016-12-01'
Group By Name , Source, Description
Result : 4 rows
I am adding this answer just for the benefit of #John so he can visually understand why his two result sets have differing numbers of records.
Imagine a table called shirts, which has only two columns, size and color. Here is some sample data:
size | color
S | red
S | green
S | blue
M | red
M | green
M | blue
L | red
L | green
L | blue
In other words, there are three sizes of shirts, and each size has three possible colors.
Now, if you execute the following query:
SELECT size
FROM shirts
GROUP BY size
you will get three records back, containing only the three sizes. However, if you do the following:
SELECT size, color
FROM shirts
GROUP BY size, color
Then you would get back nine records, or groups. All that is happening here is that the addition of another column creates new possible group combinations, and hence more groups. And the same concept applies to what you are seeing in your two queries.

SQLite delete rows based on multiple columns

im pretty new to SQLite hence asking this question!
I need to remove rows in a table so that I have the earliest occurence of column each unique value in column X(colour) based on column Y (time).
Basically i have this:
test | colour | time(s)
one | Yellow | 8
one | Red | 6
one | Yellow | 10
two | Red | 4
Which i want to remove rows so that is looks like:
test | colour | time(s)
one | Yellow | 8
two | Red | 4
Thanks in advance!
EDIT: To be clearer i need to retain the Earliest occurence in time that each colour occurred, regardless of the test.
EDIT: I can select the rows i want to keep by doing this:
select * from ( select * from COL_TABLE order by time desc) x group by colour;
which produces the desired result, but i want to remove what is not there in the result of the select.
EDIT: The following worked thanks to #JimmyB:
DELETE FROM COL_TABLE WHERE EXISTS ( SELECT * FROM COL_TABLE t2 WHERE COL_TABLE .colour = t2.colour AND COL_TABLE .test = t2.test AND COL_TABLE .time < t2.time )
You can include subqueries (EXISTS/NOT EXISTS) in the WHERE clause of a DELETE statement.
Like subqueries in SELECTs, these can refer to the table in the outer statement to create matches.
In your case, try this:
DELETE FROM my_table
WHERE EXISTS (
SELECT *
FROM my_table t2
WHERE my_table.colour = t2.colour
AND my_table.test = t2.test
AND my_table.time < t2.time
)
This statement uses three noteworthy constructs:
Subquery in DELETE
Self-join
Emulation of a MIN(...), via self-join
The subquery with EXISTS is mentioned above.
The self-join is required whenever one row of a table must be compared against other rows of the same table. Finding the minimum value of some column is exactly that.
Normally, you'd use the MIN(...) function to find the minimum. The minimum can be defined as the single value for which no lower value exists, and that's what we're using here because we're not actually interested in the actual value but only want to identify the record which contains that value.
(Since we're deleting, our SELECT yields all the non-minimum rows, which we want to delete to keep only the minimums.)
So, what the statement says is:
Delete all records from my_table for which there is at least one record in my_table with the same colour and the same test but a lower time.

I want to dynamically change the lookup columns in look up transformer in SSIS

I just want to map dynamically lookup column in LOOKUP Transform SSIS
In my task that look up column will all ways change
For Example:
TableA
```````
Col1 | Col2 | Col3
--------+-----------+---------
1 | 2 | 3
2 | 1 | 4
3 | 2 | 1
This time lookup columns are Col1+Col2
Next day it will change to Col2+Col3
I want to map dynamic input column with ssn
Depending on how the lookup columns are defined from one day to the next, it may be easier to make the query in the Lookup transformation dynamic instead. Check out the following link:
https://suneethasdiary.wordpress.com/2011/12/28/creating-a-dynamic-query-in-lookup-transformation-in-ssis/

Resources