Binding tables with fulltext search query? - database

I run into a problem. I have three tables: product, where is stored his price and name. Then table Query with atributtes like description of searched words and their frequency(so there are no duplicates). And table UsersQuery, where each of the searched words of the user are stored.
PRODUCT
id
price
name
QUERY
id
description_query
number_of_freq
USERSQUERY
id
query_id FK
user_id FK
timestamp
I have to calculate for each month in a given year and subsequent years (January 2018, February 2018,…), calculate the ratio between those search queries that contain the product name and those that do not. If the given ratio is not defined for the given month, the output should be NULL.
Do you guys know how it would be possible?
So far I just have this
select q.description_query,
to_char(uq.timestamp, 'YYYY-MM') as year_month
from usersquery as uq
join query as q ON q.id = uq.query_id;
But I dont really know how to bind table with products, just with his atributte name. Should I use some sort of fulltext search using tsvector?

-- table is case insensitive so use product,query, user_query. please refer manual 4.1 lexical structure.
demo
I hope I understand correctly. The number_of_freq refer to the time the query contain the product name. and if number_of_freqtext = 0 means that this query don't contain the product key word.
basically a generate_series to generate date series data(later for left or right join), count filter function to count the freq is 0.
final code:
WITH cte AS (
SELECT
to_char(querytimestamp, 'YYYY-MM') AS tochar1,
count(number_of_freq) AS count_all,
count(number_of_freq) FILTER (WHERE number_of_freq = 0) AS count_0
FROM
query
JOIN user_query uq ON query.query_id = uq.query_id
WHERE
querytimestamp >= '2021-01-01 00:00' at time zone 'UTC'
AND querytimestamp <= '2022-12-31 23:59' at time zone 'UTC'
GROUP BY
1
),
cte2 (
yearmonth
) AS (
SELECT
to_char(g, 'YYYY-MM')
FROM
generate_series('2021-01-01', '2022-12-31', interval '1 month') g
)
SELECT
yearmonth,
cte.*,
round(cte.count_0::numeric / count_all, 2)
FROM
cte
RIGHT JOIN cte2 ON cte.tochar1 = yearmonth;
updated demo
About count the frequency of the word. full text search won't help.
Since full text search will parse 'product.id' as 'product.id'.
You may need regexp split string functions.
refer count frequency demo to solve the words count frequency issue:

Related

Retrieve the first and last record available depending on dates

I'm trying to understand how to extract the first and last records available based on dates with the following example:
SELECT clientID, AssessmentDate, TotalScore
FROM Client.Assessments
For each of the clients (based on their clientID), I am trying to retrieve the TotalScore for their first and last assessment available (based on the AssessmentDate). I deal with lots of assessment entries, and I usually do a pre-post statistical analysis with the first assessment they have, and I compare it to the last assessment available.
The easiest is to think in two steps. First, prepare the min/max dates for each client. Second, select for rows with these dates.
SELECT clientsMaxMin.clientID
, ca.TotalScore
FROM
(
SELECT clientID
, max(AssessmentDate) as maxDate
, min(AssessmentDate) as minDate
FROM Client.Assessments AS c
GROUP BY c.clientID
) clientsMaxMin -- prepare a smaller table with max and min dates
JOIN Client.Assessments AS ca -- from the original table select only rows with min/max values
ON ca.AssessmentDate = clientsMaxMin.maxDate
OR ca.AssessmentDate = clientsMaxMin.minDate

How do I dynamically generate dates between two dates in Snowflake?

I've been searching for a good generate_series analog in Snowflake but what I've found so far is a bit limiting in scope. Most of the examples I've seen use rowcount but I need something more dynamic than that.
I have these columns:
location_id, subscription_id, start_date, end_date
The datediff of the date columns is usually a year but there are many instances where it isn't so I need to account for that.
How do I generate a gapless date range between my start and end dates?
Thank you!
There are several ways to approach this, but here's the way I do it with SQL Generator function Datespine_Groups.
The reason I like to do it this way, is because its flexible enough that I can add weekly, hourly, or monthly intervals between the dates and reuse the code.
The parameter group bounds changes the way the join happens in a subtle way that allows you to control how the dates get filtered out:
global - every location_id, subscription_id combination will start on the same start_date
local - every location_id, subscription_id has their own start/end dates based on the first and last values in the date column
mixed - every location_id, subscription_id has their own start/end dates, but they all share the same end date
Rather than try and make it perfect in 1 query, I think it's probably easier to generate it with mixed and then filter out where the group_start_date occurs after the end_date of your original data.
Here's the SQL. At the very beginning you can either (1) find a way to dynamically generate the 3 parameters, or (2) hard code a ridiculous range that'll last your career and let the rest of the query filter them out :)
You can change month to another datepart, I only assumed you were looking for monthly.
WITH GLOBAL_SPINE AS (
SELECT
ROW_NUMBER() OVER (
ORDER BY
NULL
) as INTERVAL_ID,
DATEADD(
'month',
(INTERVAL_ID - 1),
'2018-01-01T00:00' :: timestamp_ntz
) as SPINE_START,
DATEADD(
'month', INTERVAL_ID, '2018-01-01T00:00' :: timestamp_ntz
) as SPINE_END
FROM
TABLE (
GENERATOR(ROWCOUNT => 2192)
)
),
GROUPS AS (
SELECT
location_id,
subscription_id,
MIN(start_date) AS LOCAL_START,
MAX(start_date) AS LOCAL_END
FROM
My_First_Table
GROUP BY
location_id,
subscription_id
),
GROUP_SPINE AS (
SELECT
location_id,
subscription_id,
SPINE_START AS GROUP_START,
SPINE_END AS GROUP_END
FROM
GROUPS G CROSS
JOIN LATERAL (
SELECT
SPINE_START,
SPINE_END
FROM
GLOBAL_SPINE S
WHERE
S.SPINE_START >= G.LOCAL_START
)
)
SELECT
G.location_id AS GROUP_BY_location_id,
G.subscription_id AS GROUP_BY_subscription_id,
GROUP_START,
GROUP_END,
T.*
FROM
GROUP_SPINE G
LEFT JOIN My_First_Table T ON start_date >= G.GROUP_START
AND start_date < G.GROUP_END
AND G.location_id = T.location_id
AND G.subscription_id = T.subscription_id

How to do WHERE <before> an aggregate function (Postgres)

It's hard to explain from the title, but this is my SQL:
SELECT
SUM("payments"."amount"),
"invoices"."property_id"
FROM "payments"
JOIN "invoices"
ON "payments"."invoice_id" = "invoices"."id"
GROUP BY "property_id"
It returns the sum of all Payment records (amount column) for a particular Property (which is connected through it's invoices).
In other words:
Property has_many: :invoices
Invoice has_one: :payment
I'm trying to select payments between a particular date range though, but it has to happen "before" the aggregate function (so do the exact query above, but only for 2017-01-01 through 2017-02-01). The field would be generated_at on Payment
You are looking for a WHERE clause. (WHERE is executed before aggregation; HAVING is executed after.) Suggested date literals in PostgreSQL are ANSI standard DATE 'YYYY-MM-DD'. Date ranges are usually checked with >= start day and < end day + 1 (in order to deal properly with the time part if any).
SELECT
SUM(p.amount),
i.property_id
FROM payments p
JOIN invoices i ON p.invoice_id = i.id
WHERE p.generated_at >= DATE '2017-01-01'
AND p.generated_at < DATE '2017-02-02'
GROUP BY i.property_id;

Solving Duplicated in Access

i had a table depends on more than one tables and i get this final
ScrrenShoot have a look in picture
i need to choose from values if firstdate duplicated in specific criteria
for ex . i need one row for 18.2.2016 / max value ( get the greater one ) / min value (get the less one )
You need to provide us with better information, but here is what I think you're looking for.
You need a separate query for each min/max value you want to find. Where you see "MyTable" you need to replace it with the object name shown in the screenshot.
Query 1 "Max"
SELECT MyTable.FirstOfDate, Max(MyTable.MaxValue) AS MaxOfMaxValue
FROM MyTable
GROUP BY MyTable.FirstOfDate;
Query 2 "Min"
SELECT MyTable.FirstOfDate, Min(MyTable.MinValue) AS MinOfMinValue
FROM MyTable
GROUP BY MyTable.FirstOfDate;
Query 3 "Merge"
SELECT DISTINCT MyTable.FirstOfDate, Max.MaxOfMaxValue, Min.MinOfMinValue
FROM (MyTable
INNER JOIN [Max] ON MyTable.FirstOfDate = Max.FirstOfDate)
INNER JOIN [Min] ON MyTable.FirstOfDate = Min.FirstOfDate
GROUP BY MyTable.FirstOfDate, Max.MaxOfMaxValue, Min.MinOfMinValue;

SQL Server: selecting a year of account based on a specific date and a date range

I need to apportion some values to a financial year that begins on the 1st December and ends on the 30th November each year.
The rows that contain the value fields are in a table (TABLE A) that has a reference number and an incident date
Table A
ReferenceNumber, Value, IncidentDate
1, 10.00, 01/12/14
2, 15.00, 10/05/13
3, 20.00, 14/10/13
TABLE A is the joined to TABLE B which also has the reference number and contains transactional data including a start date field. Each reference number may have several transactions with different start date values and the aim is to ensure the row selected from TABLE B is the one where the start date is the most recent start date before the incident date from table A
TABLE B
ReferenceNumber, StartDate
1, 01/05/14
1, 01/05/15
2, 12/04/14
2, 12/04/15
3, 05/06/14
3, 04/06/15
TABLE C is a time table that apportions specific dates to financial years.
TABLE C
Date, FinancialYear
30/11/14, FY2013/14
01/12/14, FY2014/15
I am trying to construct a query which joins table A to table B on the Reference number and incident date to start date as described above and then adds the FinancialYear value based on the start date from Table B.
I am struggling to get this to return the correct financial year.
In addition, the data quality is poor so there are many examples where the Incident date from table A is greater than the scope of the financial year selected based on the start date from table B.
I need to be able to return either the appropriate financial year based on start date or, failing that, the financial year corresponding to the incident date
Here is the code I currently have:
SELECT a.ReferenceNumber,
b.StartDate,
c.FinancialYear
FROM dbo.TableA a
INNER JOIN dbo.TableB b
ON a.ReferenceNumber = b.ReferenceNumber
AND b.StartDate = (SELECT MIN(StartDate) FROM dbo.TableB WHERE a.IncidentDateTime > StartDate AND ReferenceNumber = a.ReferenceNumber)
INNER JOIN dbo.Calendar c
ON rdc.PolicyStartDate = c.[Date]
select
a.ReferenceNumber,
min(Value) as Value,
min(IndicentDate) as IncidentDate,
max(StartDate) as StartDate /* others are dummy aggregates but this one is not */
'FY'
+ cast(year(dateadd(month, -11, min(IncidentDate))) as char(4))
+ '/'
+ cast(year(dateadd(month, -11, min(IncidentDate))) - 1999 as char(2)) as FY
from
TableA a cross apply
(
select * from TableB b
where b.ReferenceNumber = a.Reference.Number and b.StartDate < a.IncidentDate
) b
group by a.ReferenceNumber
Your fiscal year starts eleven months "late" so it's easy to determine where a date falls without a lookup.
year(dateadd(month, -11, <date>))
Getting it to match your "FY2013/14" format takes a little extra work but you could write little functions to do these kinds of calculations. By the way, the 1999 comes from adding 1 and subtracting 2000 to get a two-digit year value. Could use modulo 100 to make it generic beyond the year 2098 if that's important.
My assumptions going in:
IncidentDate and StartDate are datatype "DATE". This should also work if they are DATETIME with all time values set the same.
TableC contains a row for every possible date (which is what you implied). Another style would be {FinancialYear, FirstDate, LastDate}, and you'd join to this table using between in the on clause.
I didn't quite get what you meant regarding "the data quality is poor". This query will pull back the desired IncidentDate and StartDate
(if available), allowing you to apply business logic to them. My sample here is "if there is no applicable StartDate, base the FinancialYear on IncidentDate. (Replace those outer joins with inner joins if the data permits it.)
Toss in parameters if you dont' want this data for all ReferenceNumbers.
Check for syntax errors, I couldn't run and test this query.
(Note that "Date" is a confusing name for a column.)
WITH ctePart1 (ReferenceNumber, IncidentDate, ClosestStartDate)
as (-- Data based on the join to "most recent prior StartDate"
select
ta.ReferenceNumber
,ta.IncidentDate
,max(tb.StartDate)
from TableA ta
left outer join TableB tb
on tb.ReferenceNumber = ta.ReferenceNumber
and tb.StartDate < ta.IncidentDate
group by
ta.ReferenceNumber
,ta.IncidentDate)
select
cte.ReferenceNumber
,cte.IncidentDate
,cte.ClosestStartDate
,isnull(tcStart.FinancialYear, tcIncident.FinancialYear) FinancialYear
from ctePart1 cte
left outer join TableC tcStart
on tcStart.Date = cte.ClosestStartDate
left outer join TableC tcIncident
on tcIncident.Date = cte.IncidentDate

Resources