Power BI combine results from two SQL-Server tables - sql-server

While using Power BI for a few months now, we (the user group) encountered an issue that is not really clear to us.
We use Power-BI with a remote SQL-Server data source, we access the data source through direct query.
Let's pretend we have 2 Tables as below-
Table name: Issue
Column:
ResolutionTime(Date/Time)
IssueID(Unique Numbers)
Table Name: WorkItem
Column:
start (Date/Time)
end (Date/Time)
IssueID (Unique Numbers, Foreign Key to "Issue" table)
Table WorkItem also contain a calculated column "WorkTime" which uses this DAX-expression as below-
WorkTime = WorkItem[end] - WorkItem[start]
The two tables are configured through Power-Bi having a two-way 1:n relationship that can be queried to collect all "WorkItem"(s) assigned to an "Issue" entry, using the "IssueID" as correlation column.
To be able to compute the aggregated "work-time" for each "WorkItem", we use a new/calculated table with the following DAX expression to aggregate the total amount of time invested for a single "Issue":
SumWork =
SUMMARIZE(
WorkItem, WorkItem[IssueID], "All work per item", SUM(WorkItem[WorkTime])
)
The above table computes the total invested work-time for a particular issue, grouping/summarizing results based on the "IssueID" foreign key. This new calculated table is also configured to have a relationship with the "Issue" table, this time a "1:1" relationship, using the IssueID as correlation column.
Now to compute the time that the issue was worked on + the time for Resolution should be summarized in a calculated column inside "Issue", but this does not work:
ResolutionAndWorkTime = Issue[ResolutionTime] + SumWork["All work per item"]
But the above DAX expression fails to compile, as it always reports that it returns "more than one result", thus not being a singular result. But that is suprising, as the two table ("Issue" and "SumWork" are related to each other with a "1:1" relationship).
Tables:
Issues
IssueID ResolutionTime ResolutionAndWorkTime
1 03:20:20 ???
2 01:20:20 ???
3 00:20:20 ???
WorkItem
IssueID start end WorkTime
1 1-2-2020 3:20:20 1-2-2020 3:25:20 00:05:00
1 2-2-2020 6:20:20 2-2-2020 7:20:20 01:00:00
3 1-3-2020 3:20:20 1-3-2020 3:29:20 00:09:00
Any ideas what to look for? Data-types? Table-definition? Table-relationships? We checked other Stackoverflow questions/answers, but no good ideas retrieved so far.
NOTE that a lot of join/merge features of Power BI are not available if direct-query is used and thus joining the tables is not really an option (we think).

You need this following code for your new Calculated column.
Visit HERE To know more about RELATED.
ResolutionAndWorkTime = Issues[ResolutionTime] + RELATED(SumWork[All work per item])

Based on input provided by "mkRabbani" (see other answer) we investigated why "RELATED" does not function as expected. The problem originates in the access to the database. As suspected earlier the function delivers the expected results once the database access is switched to "import" instead of "direct-query".
As a workaround we now joins the data inside the SQL server by using traditional database views. Of course this only works for scenarios where the database is under control of the data analytics team.

Related

Anylogic: How to create plot from database table?

In my Anylogic model I succesfully create plots of datasets that count the number of trucks arriving from terminals each hour in my simulation. Now, I want to add the actual/"observed" number of trucks arriving at a terminal, to compare my simulation to these numbers. I added these numbers in a database table (see picture below). Is there a simple way of adding this data to the plot?
I tried it by creating a variable that reads the database table for every hour and adding that to a dataset (like can be seen in the pictures below), but this did not work unfortunately (the plot was empty).
Maybe simply delete the variable and fill the dataset at the start of the model by looping through the dbase table data. Use the dbase query wizard to create a for-loop. Something like this should work:
int numEntries = (int) selectFrom(observed_arrivals).count();
DataSet myDataSet = new DataSet(numEntries);
List<Tuple> rows = selectFrom(observed_arrivals).list();
for (Tuple
row : rows) {
myDataSet.add(row.get( observed_arrivals.hour ), row.get( observed_arrivals.terminal_a ));
}
myChart.addDataSet(myDataSet);
You don't explain why it "didn't work" (what errors/problems did you get?), nor where you defined these elements.
(1) Since you want both observed (empirical) and simulated arrivals per terminal, datasets for each should be in the Terminal agent. And then the replicated plot (in Main) can have two data entries referring to data sets terminals(index).observedArrivals and terminals(index).simulatedArrivals or whatever you name them.
(2) Using getHourOfDay to add to the observed dataset is wrong because that just returns 0-23 (i.e., the hour in the current day for the current model date). Your database table looks like it has hours since model start, so you just want time(HOUR) to get the model time in elapsed hours (irrespective of what the model time unit is). Or possibly time(HOUR) - 1 if you only want to update the empirical arrivals for the hour at the end of that hour (i.e., at the same time that you updated the simulated arrivals).
(3) Using a Variable to get the database value each hour doesn't work because a variable's initial value is only evaluated once at model initialisation. You want an hourly cyclic Event in Terminal instead which adds the relevant row's value. (You need to use the Insert Database Query wizard to generate the relevant Java code for the query you need in the event's action.)
(4) Because you have a database table with specifically-named columns for each terminal (columns terminal_a and presumably terminal_b etc.) that makes it slightly more awkward. (This isn't proper relational table design where, instead of 4 columns for the 4 terminals, you'd instead have two columns for terminal_id and observed_value with a row for each time period and terminal combination.)
So your database query expression (in your Terminal agents) will need to use the SQL format (not the QueryDSL format) so that you can 'stitch in' the correct column name into the SQL.

Google Data Studio : how to obtain a SUM related to a COUNT_DISTINCT?

I have a dataset including 3 columns :
ID transac (The unique ID of the transaction - Dimension)
Source (The source of the transaction - Dimension)
Amount € (The amount of the transaction - Stat)
screenshot of my dataset
To Count the number of transactions (for one or more sources), i use COUNT_DISTINCT function
I want to make the sum of the transactions amounts (for one or more sources). But i don't want to additionate the amounts of the transactions with the same ID !
Is there a way to do this calcul with a DataStudio function ?
Thanks for your answers. :-)
EDIT : I saw that we could do this type of calculation via SQL here and I would like to do this in DataStudio (so that I don't have to pre-calculate the amounts per source.)
IMO, your dataset contains wrong data. Each value should be relative only to that line, but this is not the case: if the total is =20, each line should describe the participation of that line to the total. With 4 sources, each line should be =5 or something else that sums 20.
To solve it in DataStudio, you need something like CALCULATE function in PowerBI, but currently DataStudio doesn't support this feature.
But there are some options to consider to repair your data:
If you're sure there are always 4 sources, just create a new calculated field with the expression Amount/4 and SUM it. It is not an elegant solution, but it works.
If your data source is Google Sheets, you can easily repair the data using formulas, like in this example:
Link to spreadsheet
For this spreadsheet, I used this formula in adjusted_amount column: =C2/COUNTIF(A:A,A2). With this column in DataStudio, just use the usual SUM aggregation function to summarize it correctly.

Creating a Measure to calculate Percentage in Tableau

Normally in Tableau to calculate % I create a measure and use the below logic;
SUM ([MEASURE_1])/ SUM([MEASURE_2])
However i am trying to get the % when using a measure and an already aggregated measure and I'm getting the below error;
Argument to SUM (an aggregate function is already an aggregation, and cannot be further aggregated.
My percentage is to calculated the % difference between a policy count from table 1 which is a measure. and measure 2 which I had to create to get the count of policies from table 2 as follows; Count([Policy]). I'm using SQL Server database.
Any ideas how i can resolve this issue?
Thanks!
Measure 1 is a count from table 1, the data looks like the following;
Measure 2 is from table 2, I am creating a count in tableau of the policy number, the data looks like the below;
The 2 tables join on agent number
Table 1 structure;
[UNIQUEPOL_CNT]
,[UNIQUEAGT_CNT]
,[VECHICLE_TXT]
,[VEHICLE_DESC]
,[VEHICLE_CD]
,[CODE_DESC]
,[POLICY_ID]
,[POLICY_NBR]
,[STATE_CD]
,[AGENT_NBR]
,[AGENT_NM]
,[PROP_ID]
,[WRITTEN_DT]
,[PURCHASE_DT]
,[SUSP_IND]
,[SUSPEND_DT]
,[BIND_ID]
,[INSURED_NM]
,[CHANGEDBY_ID]
table 2 structure;
[REPORT_DT]
,[AGENT_NBR]
,[TRR_TOTALPIF_CNT]
,[TRR_TOTALPON_CNT]
,[TRR_TOTALREV_CNT]
,[TRR_TOTALERR_CNT]
This is the invalid output...
UPDATED
I created two datasets to demonstrate you a solution
dataset1
dataset 2
Connecting both datasets in Tableau.
I built a view in tableau as (Note policies field blue color as it has been converted to dimension)
Now calculate a field as
ATTR({FIXED [Agent code]: count([Policy no])}/[Policies])
Adding this to viz gives me
wherein desired percentages are displayed.
Note- since the tables have one-to-many relationships fixed statement in respect of measure on lesser side of relationship will also result in same result.
ATTR({FIXED [Agent code]: count([Policy no])}/
{FIXED [Agent code]: AVG([Policies])})

Power BI - Show all data from table1 in a visual from two many to many related tables when one other field from table2 on the X-axis

I have a key column in two many to many related table and sample representation of data is -
(attaching sample version of the table to get the point across as there are other numerous columns not contributing to this visual)
table 1 -
table 2 -
I am making a line graph with date on x axis and the value1 and value 2 on y-axis. The value1 is true for all dates. It is basically a standard target value. Now I want all the value1 summed up to show in my visual as value1 and not just the ones for which I have data on those dates. To explain it better I want 1717 on the graph as well like the total in the table in the following image -
visual -
Is there a way to do this in power BI? I tried to make a shared dimension of all unique key as a separate table and connecting both the tables to that table but there is no change in visual due to that.
You can follow these below steps to achieve your required output-
Step-1 Create a custom column in your *table 1 as below-
value_1_sum =
CALCULATE(
SUM(table_2[value1]),
ALL(table_2)
)
Step-2 Configure your line chart as below. Remember, the aggregation for new custom column will be Average as shown in the image
And here below is the final output-
Additional Reference Here below is list of options you will get after right click on the measure name-

Is there any contradiction against too many values in a table (database)?

I was wondering if there's any contradiction or futur problems against a table in a database which contains about 80 columns. There will be only VARCHARs, few INT and maybe 1 or 2 MESSAGE. I did some research on the net but there's nothing really talking about that kind of problem...In other terms, is this okay or even 'normal' to put that much of values inside a table??
Thanks in advance!
You shouldn't have any real problems if the fields are mostly integers. Most DBMSes have a limit on row length, so a bunch of long columns can cause issues...but unless the varchar columns are very long, you're probably OK.
I've honestly never even needed to think about that, though -- with a properly normalized database, it's quite rare to ever need that many columns in a table.
More columns you have, more memory server needs to process the records.
I recomend to use the "multiple to one" relation scheme in this case.
Example of tables:
customer
id
name
email
...
ins_app_form (Insurance application form)
id
customer_id (relation with customer)
date
... (here comes some other data if you need)
ins_app_item (Insurance application form items/fields)
id
ins_app_form_id (relation with Insurance application form)
question (the name of a question in application form)
answer (customer's answer)
So to show the application form with this scheme you will need to run a query:
SELECT
iaf.id AS application_id,
iaf.date AS `date`,
iai.question,
iai.answer
FROM ins_app_form AS iaf
LEFT JOIN ins_app_item AS iai ON iai.ins_app_form_id=iaf.id
WHERE iaf.customer_id=<ID of a customer>
This query will bring you something like this:
id date question answer
1 2014-03-31 "Year" "2008"
1 2014-03-31 "Car make" "Audi"
1 2014-03-31 "Car model" "Q7"
...

Resources