I know this has been asked and answered a few times here, but I can't seem to find the answer to my specific problem. Here's the recursive query:
CTE as (
SELECT
ZipCode
,Age
,[Population]
,Deaths
,DeathRate
,Death_Proportion
,DeathProbablity
,SurvivalProbablity
,PersonsAlive
FROM ProbabilityTable
WHERE Age = 0
UNION ALL
SELECT
p.ZipCode
,p.Age
,p.[Population]
,p.Deaths
,p.DeathRate
,p.Death_Proportion
,p.DeathProbablity
,p.SurvivalProbablity
,LAG(c.PersonsAlive,1) OVER(PARTITION BY p.ZipCode ORDER BY p.Age) * p.SurvivalProbablity
FROM ProbabilityTable p
INNER JOIN CTE c
ON p.ZipCode = c.ZipCode
and p.Age = c.Age
WHERE p.Age < 86
)
In the ProbabilityTable PersonsAlive is set to 100,000 when Age = 0. What I'm looking to do with the recursive CTE is multiple the previous value of PersonsAlive by the current SurvivalProbability to calculate the PersonsAlive of that Age. Age goes up to 85 so that's why I have my termination clause set at 86.
I've tried tweaking the recursive part of the query a number of times (and also setting PersonsAlive to 100,000 in the anchor part) but I can't figure it out. This is my first attempt at a recursive query and even with some course work it's not clicking for me.
EDIT
Here is the updated code that actually runs:
CTE as (
SELECT
ZipCode
,Age
,[Population]
,Deaths
,DeathRate
,Death_Proportion
,DeathProbablity
,SurvivalProbablity
,PersonsAlive
FROM ProbabilityTable
WHERE Age = 0
UNION ALL
SELECT
p.ZipCode
,p.Age
,p.[Population]
,p.Deaths
,p.DeathRate
,p.Death_Proportion
,p.DeathProbablity
,p.SurvivalProbablity
,LAG(c.PersonsAlive,1) OVER(PARTITION BY p.ZipCode ORDER BY p.Age) * p.SurvivalProbablity
FROM ProbabilityTable p
INNER JOIN CTE c
ON p.ZipCode = c.ZipCode
and p.Age = c.Age + 1
WHERE p.Age < 6
)
And here is the results it returns:
What I want the results to be for PersonsAlive is as follows:
So with each iteration of the CTE, it needs to reference the previous row of PersonsAlive and the current row of SurvivalProbability to calculate PersonsAlive
It's hard to test this without your raw data but I think your issue is you're lagging over the previous row, causing your frame of reference to be 2 rows back.
When you're using a recursive CTE, you already have access to the previous row, via CTE c. When you do LAG(c.PersonsAlive,1) you're actually telling it to look at PersonsAlive from 2 rows back from the current row (lagging 1 row back from the previous row).
Since on the first recursive pass, there is only 1 row back, the LAG() function will return NULL by default since there is no 2 rows back at that point. This is why every row in your results has NULL for the PersonsAlive column, except for the first row (anchor row from the first half of your UNION ALL clause). So if you remove the LAG() function from it and instead just do c.PersonsAlive * p.SurvivalProbablity, you should get all of the expected PersonsAlive values.
That being said, a recursive CTE seems like overkill here and you probably can just use the LAG() window function in a static call on your ProbabilityTable like so:
SELECT
ZipCode,
Age,
[Population],
Deaths,
DeathRate,
Death_Proportion,
DeathProbablity,
SurvivalProbablity,
ISNULL(LAG(PersonsAlive,1) OVER (PARTITION BY ZipCode ORDER BY Age), PersonsAlive) AS PersonsAlive
FROM ProbabilityTable
As I mentioned, I can't really test this, so please let me know if you run into any issues, and I'll help you accordingly.
Recursive CTEs are good for tree-like problems, e.g. when you need to compare multiple child rows to their parent, or interact with multiple levels of the tree simultaneously. Window functions like LAG() allow you to interact with any single row at a time relative to the current row. Your problem seems to be the latter kind.
The Sort operator says its only got 100 rows to sort. How could that possibly be more expensive than reading 1.9 million rows? I must be reading this wrong or misunderstanding something.
Also, how is the Estimated Number of Rows Per Execution in the Sort operator only 100? If the Index Seek operator estimates the Number of Rows Per Execution to be 1.9 million, how does only 100 rows get piped over to the Sort operator?
Here is the query:
DECLARE #PageIndex INT = 1000;
DECLARE #PageCount INT = 1000;
SELECT ID
FROM dbo.Table1
WHERE DateCreated >= '2021-10-27'
AND
DateCreated < '2021-10-28'
ORDER BY ID
OFFSET #PageIndex * #PageCount ROWS FETCH NEXT #PageCount ROWS ONLY
The Sort operator (as opposed to "Top N Sort") will sort the entirety of its input in its open method before returning any rows.
SQL Server estimates that the seek will output 1.9 million rows that go into the sort.
The costing is therefore for sorting 1.9 million rows.
You are doing
OFFSET 1000000 ROWS FETCH NEXT 1000 ROWS ONLY
The actual output rows from the sort will be at least 1,001,000 (maybe more in a parallel plan) and the TOP operator discards the first million for the offset and then stops requesting rows after it has received the 1000 to be returned.
The estimate of 100 is just a guess as SQL Server has no idea what the value of the variables will be at runtime when the plan is compiled.
I have a table which can contain up to billions rows
CREATE TABLE "Log4DataUsb" (
"Time" integer primary key not null ,
"Microseconds" integer ,
"Current" integer ,
"Voltage" integer )
Usually a user will want to query the data within a specific range, for example Time <= 123456789 and Time >= 0, because this may return billions rows, I want to segment the rows and only return a batch each time, like LIMIT 10,000, LIMITE 10,000 OFFSET X until it reaches the end of this time-range query.
I notice that when the number of rows goes up, this query can be quite slow, executing the queries below will take seconds even though I just want to move to the next batch.
SELECT * FROM TABLE WHERE Time <= 123456789 and Time >= 0 LIMIT 10,000
SELECT * FROM TABLE WHERE Time <= 123456789 and Time >= 0 LIMIT 10,000 OFFSET 10,0000
If the database is supposed to have 2 billion rows in total, is there any way it can largely increase the query performance?
I want to fetch the data in chunks (using a select query) like in first attempt from 1 to 50 records and in second attempt from 51 to 100 records.
Use LIMIT and OFFSET. The following query returns 50 records after skipping the first 50, so records 51 - 150 are returned.
SELECT fname, lname
FROM students
ORDER BY ssn
LIMIT 100 OFFSET 50;
https://www.postgresql.org/docs/current/static/queries-limit.html
I'm new to SQL Server so I apologize if my question seems too easy. I tried finding and answer on my own, but I'm failing miserably. I am trying to create a query which will return total size on the drive of each row in the table.
i thought about using dbcc showconting but it doesn't work for varchar(max) which appears in my table. Also, it doesn't return size for each row, but rather the average, max and min size. My reading so far suggests that it is not possible to get query that could show the size of each individual row in the table so I decided to settle for the total length of all characters in each column in each row. Indirectly it will give me idea about the size of each row.
I have a table with some varchar(500) and varchar(max) columns. I noticed that some of the rows are a lot bigger than others.
What I need is top 1000 longest rows in the table, preferably in an output showing two columns:
Column 1 showing EntryID
Column 2 showing total length of the characters in all columns together for that record (eg total length of the characters in the column 1 + total length of the characters in the column 2 + column3 + column4 etc...) It would be great if this could be aliased RowLength.
What I tried so far is:
SELECT TOP 1000
(LEN(columnname1) + LEN(columnname2) + LEN(columnname3) + LEN(columnname4)) as RowLength,
FROM dbo.tablename
ORDER BY Length Desc
It works, but it doesn't show entry ID corresponding to the total length of all characters in the row. How do I add it?
It also doesn't show the alias for the column showing number of characters in the row.
Could you please suggest how I can change the query to get the expected outcome? I'll be very grateful for any suggestions.
it doesn't show EntryID corresponding to the total length of all
characters in the row. It also doesn't show the alias for the column
showing number of characters in the row.
You have not specified an alias, so what should it show? You also haven't selected EntryID. If you want the longest 1000 rows you have to order by the length:
SELECT TOP 1000
EntryID,
Length = LEN(columnname1) + LEN(columnname2) + LEN(columnname3) + LEN(columnname4)
FROM dbo.tablename
ORDER BY Length DESC
SELECT TOP 1000 EntryID,
(LEN(columnname1) + LEN(columnname2) + LEN(columnname3) + LEN(columnname4)) AS RowLength,
FROM dbo.tablename
ORDER BY EntryID