Google Data Studio: Is it possible to filter results in a Pivot table if less than x records in a cell? - pivot-table

I am surveying employees (asking them each several "1-5 Opinion Scale" questions) & want to provide relative anonymity by limiting the display of results if there are less than (say) 6 people in the result set. GDS doesn't allow aggregate filters on pivot tables. Does anyone know if there is a way around this by using calculated fields or some other mechanism?

In this case, I'd suggest you to add this functionality at the source of your data. For example, if your data is in BigQuery you could import it with a custom query like so:
SELECT
*
FROM
`your_table`
WHERE
5<(
SELECT
COUNT(*) AS num_rows
FROM
`your_table`)

Related

Finding difference between columns in column group

I am using report builder to create a report showing a budget for a project. The dataset includes line items for both budget and projected. See below for example rows. I am using a matrix with column group to display budget and projected side by side as well as a row group to show section, category, etc. I need to have a variance column that subtracts projected from budget.
I have scoured the interwebs for solutions but nothing that has worked so far. I feel like there has to be simple solution to this given it is something that could be done in a sql query with zero effort. Most solutions are assuming I have two separate fields, but these are dynamic fields pull out with the column group.
Dataset Row Samples
Type Section Cateogry Phase Task Total
Budget Building Kitchen Pre-Construction Cabinet Hardware $100
Projected Building Kitchen Pre-Construction Cabinet Hardware $220
Report sample
COL GROUP This is the column i want
Budget Projected Variance
+Buidling $100 $220 -$120
+Kitchen
+Pre-Con
EDIT: I tried the below solution without success and have already visited every link provided in the second answer. Maybe there is something I am missing, but I ended up just doing everything in the SQL query and not use Column groups. This is 100% the simplest solution. I am very surprised there is no easy way to reference individual columns in a column group. The below may work for others, but I just could not get them to work for me. Not sure why.
You could add an additional column inside the “Type” group (provided that this is the name of your column group). Set the Column Visibility to hide the column by an expression like
= IsNothing(Previous(Field!Type.Value, “Type”)
Calculate the values for that column as
= Previous(Sum(Fields!Total.Value), “Type”) – Sum(Fields!Total.Value)
That should calculate the difference between the values of the previous type and the current type, and
only show that column for the "Projected" type (when there is a previous type).
On the matrix, you can use the group subtotals to achieve this, you only have to overwrite the SUM operation with an expression that subtract to values. There are many link mentioning how to do that or that can helps you:
How to add calculated column from dynamic columns to a matrix
Adding subtotals to SSRS report tablix
How to write Expression to subtract row Group SubTotals
Reporting in SQL Server – Using calculated Expressions within reports

Data profiling Task - custom Profile Request

Is there any option to create a custom Profile Request for SSIS Data Profiling Task?
At the moment there are 5 standard profile requests under SSIS Data Profiling task:
Column Null Ratio Profile Request
Column Statistics Profile Request
Column Length Distribution Profile Request
Column Value Distribution Profile Request
Candidate Key Profile Request
I need to add another one (Custom one) to get summary of all numeric values.
Thanks in advance for your helps.
Based on this Microsoft Documentation, SSIS Data profiling Task has only 5 main profiles (listed on your question) and there is no option to add a custom profile.
For a similar reason, i will create an Execute SQL Task to achieve that, you can use the aggregate functions you need and ISNUMERIC function in the where clause :
SELECT MAX(CAST([Column] AS BIGINT)) -- Maximum value
,MIN(CAST([Column] AS BIGINT)) -- Minimum value
,COUNT(Column) -- Count values
,COUNT(DISTINCT [Column]) -- Count distinct values
,AVG(CAST([Column] AS BIGINT)) -- Average
,SUM(CAST([Column] AS BIGINT)) -- Sum
FROM TABLE
WHERE ISNUMERIC([Column]) = 1
I think what you want to do here is create a computed column that is populated with your source column only if IsNumeric(SourceColumn) = 1.
Then create a profile task using Column Value Distribution Profile Request on the computed column, with ValueDistributionOption set to AllValues.
Edit:
To further clarify, the computed column doesn't have to be a task in SSIS, although that's how I was thinking about it when I came up with my answer. You could ALTER the table you want to profile, adding the computed column, and then create the Profile Task as I explained above.
I was also under the assumption that you wanted to profile the values of a single column. If you're wanting to do this for multiple columns, or need to profile the summary values aggregated from details records, then this answer may not be the best solution.

Mule - Record cannot be mapped as it contains multiple columns with the same label

I need to do join query to MS SQL Server 2014 DB based on a column name value. The same query runs when doing query directly to DB, but when doing query through Mule I'm getting error. The query looks something like this :
SELECT * FROM sch.emple JOIN sch.dept on sch.emple.empid = sch.dept.empid;
The above query work fine while doing query directly to MS SQL Server DB, but gives the following error through mulesoft.
Record cannot be mapped as it contains multiple columns with the same label. Define column aliases to solve this problem (java.lang.IllegalArgumentException). Message payload is of type: String
Request you to please help me out.
Specify columns list directly:
SELECT e.<col1>, e.<col2>, ...., d.<col1>,...
FROM sch.emple AS e
JOIN sch.dept AS d
ON e.empid = d.empid;
Remarks:
You could use aliases instead of schema.table_name
SELECT * in production code in 95% cases is bad practice
The column that has duplicate is empid(or more). You could add alias for it e.empid AS emple_empid and d.empid AS dept_empid or just specify e.empid once.
To avoid specifying all columns manually, you could drag and drop them from object explorer to query pane like Drag and Drop Column List into query window.
Second way is to use plugin like Redgate Prompt to expand SELECT *:
Image from: https://www.simple-talk.com/sql/sql-tools/sql-server-intellisense-vs.-red-gate-sql-prompt/
Addendum
But the same query works directly.
It works because you don't bind them. Please read carefully link I provided for SELECT * antipattern and especially:
Binding Problems
When you SELECT *, it's possible to retrieve two columns of the same name from two different tables. This can
often crash your data consumer. Imagine a query that joins two
tables, both of which contain a column called "ID". How would a
consumer know which was which? SELECT * can also confuse views (at
least in some versions SQL Server) when underlying table structures
change -- the view is not rebuilt, and the data which comes back can
be nonsense. And the worst part of it is that you can take care
to name your columns whatever you want, but the next guy who comes
along might have no way of knowing that he has to worry about adding a
column which will collide with your already-developed names.
But the same query works directly.
by Dave Markle

Retrieve a uniform data sample from a table

I'm trying to do some analysis on data stored in a SQL table in an external application. However, there is simply too much data to retrieve all of the relevant rows.
I'm trying to get around this by only fetching a small, uniform sample of the data.
Is there a straightforward way to accomplish this?
You can use TABLESAMPLE, e.g.
select * from [yourtable] tablesample(10 percent)

XQuery in Sql Server 2005 - find node in same position

I'm trying to solve the following problem which results in the serialization of objects representing calculations in the database.
I'm trying to perform a query for reporting purposes and need to find the node in the same position in a different part of the XML hierarchy (these come from serialization of string[] and double[] attributes of the object).
For example I have something like
...<parent>
<Names>
<string>Name1</string>
<string>Name2</string>
<string>Name3</string>
</Names>
and
...<parent>
<Weights>
<double>0.5</double>
<double>0.13</double>
<double>0.2</double>
</Weights>
I wish to be able to query the XML blob and pull out Name-Weight pairs for each XML blob so that I can query in SQL rather than have to deserialize objects. I can pull out the Names and I can pull out the weights but if I combine them it comes out as a crossed query as I am struggling to positionally match them up. I thought the answer is perhaps to create two views, one for names and one for weights, and join them on position but position() is not allowed in the query unless it's something like [position() < 6].
Solved the problem by creating 2 separate views and then an aggregating view. I used
ROW_NUMBER() OVER(PARTITION BY Id ORDER BY Id) as ItemPosition
as one of the columns in each query/view. I then joined on Id and ItemPosition. Not sure if it's the best way of doing it but at least it's matching up relevant items.

Resources