Inner Join tables between a date range in QlikView - inner-join

How can I accomplish an inner join between two tables via a date field which ranges between two values in QlikView?
In SQL this is possible with something like:
INNER JOIN TableA ON (TableA.Dates BETWEEN TableB.Start_Date AND TableB.End_Date)
In QlikView I have something like this:
DatesData:
LOAD * Inline [
Test_Date
11/1/2013
12/1/2013
1/1/2014
2/1/2014
3/1/2014
4/1/2014
5/1/2014
];
PersonData:
LOAD * Inline [
ID, Start_Date, End_Date
1, 12/1/2013, 2/1/2014
2, 1/1/2013, 3/1/2014
3, 2/1/2014, 4/1/2014
];
I need to create a table like this:
ID, Dates
1, 12/1/2013
1, 1/1/2014
1, 2/1/2014
2, 1/1/2014
2, 2/1/2014
2, 3/1/2014
etc.....
How can I accomplish a join like this?

The answer depends on whether you have a large range of values in your PersonData table because if you only have a few, you could probably get away with using an if statement if the ranges are fixed and do not change between reloads.
However, for more than five entries this becomes unweildly. In this case, there is an equivalent to SQL's between operator in QlikView called IntervalMatch. It behaves slightly differently to between since you cannot call it in an expression, but the principles are the same.
The below script uses IntervalMatch to match the ranges in PersonData to the date in DatesData by creating a link table. IntervalMatch tends to create synthetic keys/tables when left to its own devices, this is why we then follow up the inner join with another join into DatesData from PersonData (try leaving the second join out and viewing how the tables are linked).
Finally we drop PersonData as all the required fields are already in DatesData.
The only side-effect of this method is that you then have Start_Date and End_Date in your main table. However, you can quickly remedy this by adding a DROP FIELDS Start_Date, End_Date line to your script.
DatesData:
LOAD * Inline [
Test_Date
11/1/2013
12/1/2013
1/1/2014
2/1/2014
3/1/2014
4/1/2014
5/1/2014
];
PersonData:
LOAD * Inline [
ID, Start_Date, End_Date
1, 12/1/2013, 2/1/2014
2, 1/1/2013, 3/1/2014
3, 2/1/2014, 4/1/2014
];
INNER JOIN (DatesData)
IntervalMatch (Test_Date)
LOAD
Start_Date,
End_Date
Resident PersonData;
JOIN (DatesData)
LOAD
*
RESIDENT PersonData;
DROP TABLE PersonData;

Related

Aggregate query - column for querying affects aggregation

I have a table "Scores" with fields as follows:
UserId
LessonId
ExerciseId
Score
Timestamp
I want to setup a view, "vw_AggregateScoreForUser" that will aggregate data from that table, as follows:
SELECT UserId,
LessonId,
COUNT(ExerciseId) AS TotalExercises,
SUM(Score) AS TotalScore,
COUNT(DISTINCT CONVERT(date, Timestamp)) AS StudyDays
FROM Scores
GROUP BY UserId, LessonId
The tricky bit is StudyDays, where I'm counting the unique dates that the user has at least one entry here on - that gives me the days that they "studied", i.e. completed at least one exercise.
Now, say that I want to execute this view for lessons 1 to 5.
SELECT FROM vw_AggregateScoreForUser WHERE UserId = 1 AND LessonId BETWEEN 1 AND 5;
What I want, is one record returned that aggregates the data for those 5 lessons. But with the above setup, the data is grouped by LessonId, so I will get 5 records back.
The issue is that StudyDays may now be incorrect as it's computed per lesson. E.g. with the following data:
UserId LessonId ExerciseId ... Timestamp
1 1 1 2019-11-21 09:00
1 1 2 2019-11-22 10:00
1 2 1 2019-11-22 11:00
I would get the result
UserId LessonId TotalExercises ... StudyDays
1 1 2 2
1 2 1 1
I can't simply add StudyDays to get the number of days studied. That would give me 3, but the distinct count for StudyDays overall should be 2.
The issue is that I need LessonId in the view in order to be able to use it in the WHERE clause, but having it in the view will group my data by lesson causing the aggregate to be incorrect.
How do you include a field in a view so that you can filter on it, without having it affect the aggregation that occurs in that view?
Some grouping aggregates can't be stacked in multiple levels, as they give different result. A count-distinct from a count-distinct isn't the same as applying a count-distinct from the original set. The same happens with averages, which take into account the number of rows.
The problem in your case is the GROUP BY LessonID with a COUNT DISTINCT inside the view. You are already computing values by LessonID when you want (later on) multiple LessonID values to be computed together as a set.
As long as you keep your GROUP BY inside the view, you will have this problem. A solution would be changing the view for a table-valued function, which allows a range of lessons to be supplied:
CREATE FUNCTION dbo.ufnUserLessonSummary (
#UserID INT,
#LessonIDFrom INT,
#LessonIDTo INT)
RETURNS TABLE
AS RETURN
SELECT
UserId,
LessonId,
COUNT(ExerciseId) AS TotalExercises,
SUM(Score) AS TotalScore,
COUNT(DISTINCT CONVERT(date, Timestamp)) AS StudyDays
FROM
Scores AS S
WHERE
S.UserID = #UserID AND
S.LessonID BETWEEN #LessonIDFrom AND #LessonIDTo
GROUP BY
UserId,
LessonId
You can query it like the following:
SELECT
S.*
FROM
dbo.ufnUserLessonSummary(1, 1, 5) AS S
However, this is limited to a range of lessons. What happens if you want only lessons 1, 3 and 5? Another more complex, but more versatile option is to use an SP with a pre-loaded input table:
CREATE PROCEDURE dbo.uspUserLessonSummary
AS
BEGIN
SELECT
UserId,
LessonId,
COUNT(ExerciseId) AS TotalExercises,
SUM(Score) AS TotalScore,
COUNT(DISTINCT CONVERT(date, Timestamp)) AS StudyDays
FROM
Scores AS S
INNER JOIN #UserLesson AS U ON
S.UserID = U.UserID AND
S.LessonID = U.LessonID
GROUP BY
UserId,
LessonId
END
You can supply which records you want by loading the temporary table before executing:
IF OBJECT_ID('tempdb..#UserLesson') IS NOT NULL
DROP TABLE #UserLesson
CREATE TABLE #UserLesson (
UserID INT,
LessonID INT)
INSERT INTO #UserLesson (
UserID,
LessonID)
VALUES
(1, 1),
(1, 2),
(1, 3),
(1, 4),
(1, 5)
EXEC dbo.uspUserLessonSummary
You can also use variable tables with this approach.

How to accummulate two datetime in two tables as VIEW in SQL Server 2014?

How to query to accumulate two datetime columns in two tables in SQL Server 2014? This is an example for your reference:
Check-In table
InID UserID CheckInTime
---------------------------------
IN-001 1 2018-11-10 08:00:00
IN-002 2 2018-11-15 07:00:00
Check-Out table
OutID UserID CheckOutTime
----------------------------------
OUT-001 1 2018-11-10 12:00:00
OUT-002 2 2018-11-15 14:00:00
Result set (expected)
ResultID UserID InID OutID WorkTimeinHour
--------------------------------------------------------
1 1 IN-001 OUT-001 4
2 2 IN-002 OUT-002 7
Similar to #PSK, I used STUFF function to replace "IN-" and "OUT-" characters
But since these are in JOIN conditions, those operations will cause performance loss
It is better to use a numeric column in both tables instead of useless "IN-" and "OUT-" containing string columns
select
i.UserId, i.InID, CheckInTime, o.OutID, CheckOutTime,
dbo.fn_CreateTimeFromSeconds(DATEDIFF(ss, CheckInTime, CheckOutTime)) as TotalTime
from CheckIn i
inner join CheckOut o
on i.UserId = o.UserId and
STUFF (i.InID,1,3,'') = STUFF (o.OutID,1,4,'')
Additionally, I used a custom user-defined fn_CreateTimeFromSeconds function to format time for HH:MI:SS format
Hope it helps
For your current scenario, you can try like following.
Assuming that IN and OUT id after the "-" will be same as one entry.
SELECT ROW_NUMBER()
OVER(
ORDER BY (SELECT NULL)) AS ResultIt,
T1.inid,
T2.outid,
DATEDIFF(hh, T2.checkouttime, T1.checkintime)
FROM checkin T1
INNER JOIN checkout T2
ON REPLACE(T1.inid, 'IN-', '') = REPLACE(T2.outid, 'OUT-', '')
This query will not perform good for huge data as REPLACE is being used in the JOIN. Ideally you should have a single identifier to identify the IN and OUT transaction.

Query Most Recent Records in MS Access Based on Date Provided in Form Field

Let me start by noting I have spent a few days searching through S.O. and have not been able to find a solution. I apologize in advance if the solution is very simple, but I am still learning and appreciate any help I can get.
I have a MS Access 2010 Database, and I am trying to create a set of queries to inform other forms and queries. There are two tables: Borrower Contact Info (BC_Info) and Basic Financial Indicators (BF_Indicators). Each month, I review and track key performance metrics of each borrower. I would like to create a query that supplies the most recent record based on a textbox input (Forms![Portfolio_Review Menu]!Text47).
Two considerations have separated this from other posts I have seen in the 'greatest-n-per-group' tag:
Not every borrower will have data for every month.
I need to be able to see back in time, i.e. if it is January 1, 2019 and I want to see the metrics as of July 31, 2017, I want to make
sure I am only seeing data from before July 31, 2017 but as close to
this date as possible.
Fields are as follows:
BC_Info
- BorrowerName
-PartnerID
BF_Indicators
-Fin_ID
-DateUpdated
The tables are connected by BorrowerName -- which is a unique naming convention used for the primary key of BC_Info.
What I currently have is:
SELECT BCI.BorrowerName, BCI.PartnerID, BFI.Fin_ID, BFI.DateUpdated
FROM ((BC_Info AS BCI
INNER JOIN BF_Indicators AS BFI
ON BFI.BorrowerName = BCI.BorrowerName)
INNER JOIN
(
SELECT Fin_ID, MAX(DateUpdated) AS MAX_DATE
FROM BF_Indicators
WHERE (DateUpdated <= Forms![Portfolio_Review Menu]!Text47 OR
Forms![Portfolio_Review Menu]!Text47 IS NULL)
GROUP BY Fin_ID
) AS Last_BF ON BFI.Fin_ID = Last_BF.Fin_ID AND
BFI.DateUpdated = Last_BF.MAX_DATE);
This gives me the fields I need, and will keep records out that are past the date given in the textbox, but will give all records from before the textbox input -- not just the most recent.
Results (Date Entered is 12/31/2018; MEHN-45543 is only Borrower with information later than 09/30/2018):
BorrowerName PartnerID Fin_ID DateUpdated
MEHN-45543 19 9 12/31/2018
ARYS-7940 5 10 9/30/2018
FINS-21032 12 11 9/30/2018
ELET-00934 9 12 9/30/2018
MEHN-45543 19 18 9/30/2018
Expected Results (Date Entered is 12/31/2018; MEHN-45543 is only Borrower with information later than 09/30/2018):
BorrowerName PartnerID Fin_ID DateUpdated
MEHN-45543 19 9 12/31/2018
ARYS-7940 5 10 9/30/2018
FINS-21032 12 11 9/30/2018
ELET-00934 9 12 9/30/2018
As mentioned, I am planning to use the results of this Query to generate further queries that use aggregated information from the Financial Indicators to determine portfolio quality at the time.
Please let me know if there is any other information I can provide. And again, thank you in advance.
Try joining BC_Info to a query that aggregates BF_Indicators on BorrowerName, not Fin_ID. Tested with literal date value:
SELECT BC_Info.*, MaxDate
FROM BC_Info
INNER JOIN
(SELECT BorrowerName, Max(DateUpdated) AS MaxDate
FROM BF_Indicators WHERE DateUpdated <=#12/31/2018# GROUP BY BorrowerName) AS Q1
ON BC_Info.BorrowerName=Q1.BorrowerName;
If you need to include Fin_ID in the results, then:
SELECT BC_Info.*, Fin_ID, DateUpdated FROM BC_Info
INNER JOIN
(SELECT * FROM BF_Indicators WHERE Fin_ID IN
(SELECT TOP 1 Fin_ID FROM BF_Indicators AS Dupe
WHERE Dupe.BorrowerName=BF_Indicators.BorrowerName AND DateUpdated<=#12/31/2018#
ORDER BY Dupe.DateUpdated DESC)
) AS Q1
ON BC_Info.BorrowerName = Q1.BorrowerName;
If you don't like TOP N, adjust your original query:
SELECT BCI.BorrowerName, BCI.PartnerID, BFI.Fin_ID, BFI.DateUpdated
FROM ((BC_Info AS BCI
INNER JOIN BF_Indicators AS BFI
ON BFI.BorrowerName = BCI.BorrowerName)
INNER JOIN
(
SELECT BorrowerName, MAX(DateUpdated) AS MAX_DATE
FROM BF_Indicators
WHERE (DateUpdated <= #12/31/2018#)
GROUP BY BorrowerName
) AS Last_BF ON BFI.BorrowerName = Last_BF.BorrowerName AND
BFI.DateUpdated = Last_BF.MAX_DATE);
And 1 more to think about:
SELECT BC_Info.PartnerID, BC_Info.BorrowerName, BF_Indicators.Fin_ID, BF_Indicators.DateUpdated
FROM BC_Info RIGHT JOIN BF_Indicators ON BC_Info.BorrowerName = BF_Indicators.BorrowerName
WHERE (((BF_Indicators.DateUpdated)=DMax("DateUpdated","BF_Indicators","BorrowerName='" & [BC_Info].[BorrowerName] & "' AND DateUpdated<=#12/31/2018#")));

SQL Server: selecting a year of account based on a specific date and a date range

I need to apportion some values to a financial year that begins on the 1st December and ends on the 30th November each year.
The rows that contain the value fields are in a table (TABLE A) that has a reference number and an incident date
Table A
ReferenceNumber, Value, IncidentDate
1, 10.00, 01/12/14
2, 15.00, 10/05/13
3, 20.00, 14/10/13
TABLE A is the joined to TABLE B which also has the reference number and contains transactional data including a start date field. Each reference number may have several transactions with different start date values and the aim is to ensure the row selected from TABLE B is the one where the start date is the most recent start date before the incident date from table A
TABLE B
ReferenceNumber, StartDate
1, 01/05/14
1, 01/05/15
2, 12/04/14
2, 12/04/15
3, 05/06/14
3, 04/06/15
TABLE C is a time table that apportions specific dates to financial years.
TABLE C
Date, FinancialYear
30/11/14, FY2013/14
01/12/14, FY2014/15
I am trying to construct a query which joins table A to table B on the Reference number and incident date to start date as described above and then adds the FinancialYear value based on the start date from Table B.
I am struggling to get this to return the correct financial year.
In addition, the data quality is poor so there are many examples where the Incident date from table A is greater than the scope of the financial year selected based on the start date from table B.
I need to be able to return either the appropriate financial year based on start date or, failing that, the financial year corresponding to the incident date
Here is the code I currently have:
SELECT a.ReferenceNumber,
b.StartDate,
c.FinancialYear
FROM dbo.TableA a
INNER JOIN dbo.TableB b
ON a.ReferenceNumber = b.ReferenceNumber
AND b.StartDate = (SELECT MIN(StartDate) FROM dbo.TableB WHERE a.IncidentDateTime > StartDate AND ReferenceNumber = a.ReferenceNumber)
INNER JOIN dbo.Calendar c
ON rdc.PolicyStartDate = c.[Date]
select
a.ReferenceNumber,
min(Value) as Value,
min(IndicentDate) as IncidentDate,
max(StartDate) as StartDate /* others are dummy aggregates but this one is not */
'FY'
+ cast(year(dateadd(month, -11, min(IncidentDate))) as char(4))
+ '/'
+ cast(year(dateadd(month, -11, min(IncidentDate))) - 1999 as char(2)) as FY
from
TableA a cross apply
(
select * from TableB b
where b.ReferenceNumber = a.Reference.Number and b.StartDate < a.IncidentDate
) b
group by a.ReferenceNumber
Your fiscal year starts eleven months "late" so it's easy to determine where a date falls without a lookup.
year(dateadd(month, -11, <date>))
Getting it to match your "FY2013/14" format takes a little extra work but you could write little functions to do these kinds of calculations. By the way, the 1999 comes from adding 1 and subtracting 2000 to get a two-digit year value. Could use modulo 100 to make it generic beyond the year 2098 if that's important.
My assumptions going in:
IncidentDate and StartDate are datatype "DATE". This should also work if they are DATETIME with all time values set the same.
TableC contains a row for every possible date (which is what you implied). Another style would be {FinancialYear, FirstDate, LastDate}, and you'd join to this table using between in the on clause.
I didn't quite get what you meant regarding "the data quality is poor". This query will pull back the desired IncidentDate and StartDate
(if available), allowing you to apply business logic to them. My sample here is "if there is no applicable StartDate, base the FinancialYear on IncidentDate. (Replace those outer joins with inner joins if the data permits it.)
Toss in parameters if you dont' want this data for all ReferenceNumbers.
Check for syntax errors, I couldn't run and test this query.
(Note that "Date" is a confusing name for a column.)
WITH ctePart1 (ReferenceNumber, IncidentDate, ClosestStartDate)
as (-- Data based on the join to "most recent prior StartDate"
select
ta.ReferenceNumber
,ta.IncidentDate
,max(tb.StartDate)
from TableA ta
left outer join TableB tb
on tb.ReferenceNumber = ta.ReferenceNumber
and tb.StartDate < ta.IncidentDate
group by
ta.ReferenceNumber
,ta.IncidentDate)
select
cte.ReferenceNumber
,cte.IncidentDate
,cte.ClosestStartDate
,isnull(tcStart.FinancialYear, tcIncident.FinancialYear) FinancialYear
from ctePart1 cte
left outer join TableC tcStart
on tcStart.Date = cte.ClosestStartDate
left outer join TableC tcIncident
on tcIncident.Date = cte.IncidentDate

SQL Query to determine VAT rate

I'm looking to create a 3 column VAT_Parameter table with the following columns:
VATID, VATRate, EffectiveDate
However, I can't get my head around how I would identify which vat rate applies to an invoice date.
for example if the table was populated with:
1, 17.5, 1/4/1991
2, 15, 1/1/2009
3, 20, 4/1/2011
Say for example I have an invoice dated 4/5/2010, how would an SQL query select the correct VAT rate for that date?
select top 1 *
from VatRate
where EffectiveDate<=#InvoiceDate
order by EffectiveDate desc
Or, with a table of invoices
select id, invoicedate, rate
from
(
select
inv.id, inv.invoicedate, vatrate.rate, ROW_NUMBER() over (partition by inv.id order by vatrate.effectivedate desc) rn
from inv
inner join vatrate
on inv.invoicedate>=vatrate.effectivedate
) v
where rn = 1
PS. The rules for the rate of VAT to be charged when the rate changes are more complicated than just the invoice date. For example, the date of supply also matters.
I've run into this kind of thing before. There are two choices I can think of:
1. Expand the table to have two dates: EffectiveFrom and EffectiveTo. (You'll have to have a convention about whether each of these is exclusive or inclusive - but that's always a problem when using dates). This raises the problem of validating that the table population, as a whole, makes sense. e.g. that you don't end up with one row with Rate1 effective from 1/1/2000-1/1/2002, and another (overlapping) with Rate2 effective from 30/10/2001-1/1/2003. Or an uncovered gap in time, where no rate applies. Since this sounds like a very slowly-changing table, populated occasionally (by people who know what they're doing?), this could be the best solution. The SQL to get the effective rate would then be simple:
SELECT VATRate FROM VATTable WHERE (EffectiveFrom<=[YourInvoiceDate]) AND (EffectiveTo>=[YourInvoiceDate])
or
2. Use your existing table structure, and use some slightly more complicated SQL to determine the effective rate for an invoice.
Using your existing structure, something like this would work:
SELECT VATTAble.VATRate FROM
VATTable
INNER JOIN
(SELECT Max(EffectiveDate) AS LatestDate FROM VATTable WHERE EffectiveDate<=
YourInvoiceDate) latest
ON VATTable.EffectiveDate=latest.LatestDate
An easier option may just be to denormalise your data structure and store the VAT rate in the invoice table itself.

Resources