I have a large data set of vehicle fitment information for products, each on their own row.
I am struggling to create a query to select only the minimum and maximum years for each overlapping entry.
For example, I have data such as:
fromyear toyear makename modelname submodelname wheelbase BedLength BedTypeName bodytype note1 Note2 note3 partterminologyname exppartno
2008 2012 Chevrolet Silverado 1500 LT NULL 78.00 Fleetside NULL Black NULL NULL Truck Bed Mat 37807
2010 2010 Chevrolet Silverado 1500 LT NULL 78.00 Fleetside NULL Black NULL NULL Truck Bed Mat 37807
2014 2017 Chevrolet Silverado 1500 LT NULL 78.00 Fleetside NULL Black NULL NULL Truck Bed Mat 37807
I am not concerned with keeping the data, so I've moved my focus to an UPDATE query by selecting the minimum and maximum years, but adding something like
(SELECT MIN(p2.fromyear)
FROM prod AS p2
WHERE p1.fromyear > 0
AND p2.toyear >= p1.fromyear
AND p2.fromyear < p1.fromyear
AND ISNULL(p2.makename, '') = ISNULL(p1.makename, '')
AND ISNULL(p2.modelname, '') = ISNULL(p1.modelname, '')
AND ISNULL(p2.submodelname, '') = ISNULL(p1.submodelname, '')
AND ISNULL(FLOOR(p2.wheelbase), 0) = ISNULL(FLOOR(p1.wheelbase), 0)
AND ISNULL(FLOOR(p2.BedLength), 0) = ISNULL(FLOOR(p1.BedLength), 0)
AND ISNULL(p2.BedTypeName, '') = ISNULL(p1.BedTypeName, '')
AND ISNULL(p2.bodytype, '') = ISNULL(p1.bodytype, '')
AND ISNULL(p2.note1, '') = ISNULL(p1.note1, '')
AND ISNULL(p2.Note2, '') = ISNULL(p1.Note2, '')
AND ISNULL(p2.note3, '') = ISNULL(p1.note3, '')
AND ISNULL(p2.exppartno, '') = ISNULL(p1.exppartno, '')) AS newfrom
causes the query to run for an excessive amount of time (pulling from a table with over 150k rows).
After doing an UPDATE to merge the years, I can simply remove any duplicate rows.
The desired result would return only two rows for this model, 2008-2012 and 2014-2017
My original idea was to simply select MIN(fromyear) and MAX(toyear), however this leaves me with an issue of having the invalid year of 2013 as an option.
Is there some simple way to formulate a query to handle overlapping years like this? Everything I found in my searches did not involve matching multiple columns of data.
I would suggest joining onto a date table, with a list of sequential years as follows (to cover the full range of years in the source data):
year
-----
...
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
...
So joining your source table to the date table ON (year >= fromyear AND year <= toyear), gives the following results:
year fromyear toyear vehicle_descriptor
2008 2008 2012 Chevrolet...
2009 2008 2012 Chevrolet...
2010 2008 2012 Chevrolet...
2011 2008 2012 Chevrolet...
2012 2008 2012 Chevrolet...
2010 2010 2010 Chevrolet...
2014 2014 2017 Chevrolet...
2015 2014 2017 Chevrolet...
2016 2014 2017 Chevrolet...
2017 2014 2017 Chevrolet...
Then group (or select distinct) the rows to eliminate duplicate years. (I'm using "vehicle_descriptor" as a shorthand for all the columns that uniquely identify a vehicle in your source data.)
On the deduplicated results, add a column as follows:
(year - ROW_NUMBER() OVER (PARTITION BY vehicle_descriptor ORDER BY year ASC) ) AS year_group
This produces a unique number for every year or continuous sequence of years.
year fromyear toyear veicle_descriptor row_number year_group (year - row_number)
2008 2008 2012 Chevrolet... 1 2007
2009 2008 2012 Chevrolet... 2 2007
2010 2008 2012 Chevrolet... 3 2007
2011 2008 2012 Chevrolet... 4 2007
2012 2008 2012 Chevrolet... 5 2007
2010 2010 2010 Chevrolet... (this row removed as year 2010 is a duplicate)
2014 2014 2017 Chevrolet... 6 2008
2015 2014 2017 Chevrolet... 7 2008
2016 2014 2017 Chevrolet... 8 2008
2017 2014 2017 Chevrolet... 9 2008
Finally, once you have this year_group, simply group the rows in the way you originally envisaged, by vehicle_descriptor and year_group, and select the MIN(year) and MAX(year).
The year_group value has no particular significance is not retained in the final results - it's just there to differentiate the sequences. It works because it increments every time there is a discontinuity in the year sequence (and it increments by the amount of discontinuity).
I hope I've explained that satisfactorily. I'm not at my desktop PC, so I've written it all out by hand! If there's anything unclear, or you need a code example, let me know and I'll come back to you.
Related
EXCEL: When I format the number 43466 as a date I get 1/1/2019
SQL Server: CAST(43466 AS DATETIME) yields Jan 3 2019
Which is correct?
How can I get the correct date in SQL Server?
I've spent plenty of time Googling and have found possible solutions to much simpler tables.
My company uses BGInfo religiously to keep track of ~80 properties on over 200 web servers. Those fields are a mix of server stats (IP address, OSVer, HyperV host) and versions of various installed software components. A scheduled task writes this information every day all web servers to a single database (currently >300K records). We need a query (to eventually be fed into a report) to give us a time of when something has changed on any given web server. A sort of automated change control if you will.
Example: WebSvr_XYZ used had 2G of RAM since inception before getting additional RAM allocated a few months later. Then a year after that, it was given a new IP address.
Server Time_stamp Host IP RAM
-------------------------------------------------------
WebSvr_XYZ June 1, 2016 Virt5a 192.168.10.45 2G
WebSvr_XYZ June 2, 2016 Virt5a 192.168.10.45 2G
WebSvr_XYZ Aug 20, 2016 Virt5a 192.168.10.45 4G
WebSvr_XYZ Aug 21, 2016 Virt5a 192.168.10.45 4G
WebSvr_XYZ July 18, 2017 Virt5a 192.168.20.105 4G
WebSvr_XYZ July 19, 2017 Virt5a 192.168.20.105 4G
WebSvr_XYZ July 20, 2017 Virt5a 192.168.20.105 4G
When running against WebSvr_XYZ, the output against over 540 records would be
June 1, 2016 Virt5a 192.168.10.45 2G
Aug 20, 2016 Virt5a 192.168.10.45 4G
July 18, 2017 Virt5a 192.168.20.105 4G
I’ve tried the select distinct against the table them Joining it against the full table, joining on all relevant columns and using a MIN(Timestamp) to get the first. But I either get bad timestamps or no results at all.
use row_number() as a sequence of event. After that you can INNER JOIN back itself data of comparing previous timestamp with current timestamp
; with cte as
(
select *, rn = row_number() over (partition by Server order by Time_stamp)
from yourtable
)
select *
from cte c1
inner join c2 on c1.Server = c2.Server
and c1.rn = c2.rn - 1
where c1.IP <> c2.IP
or c1.RAM <> c2.RAM
Interestingly enough I found no post for this specific, but basic issue.
Goal: update the latest budgetid record docstatus = 0. Then I want to update the next-to-last budgetid record docstatus = 1. I am trying this within PHP but also testing in my SQL Server SEM and it is failing there, too.
My SQL Server statement:
select
budgetid, docstatus, datechanged
from
ccy_budget
where
activityid = 11111
order by
datechanged desc
limit 1,1;
Error that occurs in SEM is:
Incorrect syntax near 'limit'.
Yet in w3schools this [sample] sql works just fine:
SELECT *
FROM Customers
ORDER BY postalcode DESC
LIMIT 1,1;
Seems so simple, surely I am missing something fundamental.
Microsoft SQL Server 2008 R2 (RTM) - 10.50.1600.1 (X64)
Apr 2 2010 15:48:46
Copyright (c) Microsoft Corporation
Enterprise Edition (64-bit) on Windows NT 6.2 <X64> (Build 9200: ) (Hypervisor)
Equivalent syntax in SQL Server would be
select *
from table
order by somerow desc
offset 1 rows fetch next 1 rows only;
But the above is available from SQL Server 2012 on, so for your version, you have to some thing like below
;with cte
as
(
select *,row_number() over (order by postalcode desc) as rn
from table
)
select * from cte where rn=2
I have prototype of SQL query (actual query is too huge to post)
SELECT Site, Risk_Time_Stamp,COMPUTER_NAME, [IP_ADDR1_TEXT],Number_of_Risks
FROM dbo.sem_computer
WHERE [dbo].[V_SEM_COMPUTER].COMPUTER_ID = SEM_COMPUTER.COMPUTER_ID
GROUP BY Site, Risk_Time_Stamp,COMPUTER_NAME, [IP_ADDR1_TEXT],Number_of_Risks
That outputs
Site Risk_Time_Stamp COMPUTER_NAME IP_ADDR1_TEXT Number_of_Risks
16K987 Aug 14, 2015 ADBF8J2 10.90.0.52 2
16K987 Aug 14, 2015 AD25N10 10.51.0.80 1
16K987 Aug 14, 2015 N20C0F8J2 10.18.0.79 1
How to create query that will output site, along with column named RISK STATISTICS that has table, i.e.
SITE RISK STATISTICS
16K987 Risk_Time_Stamp COMPUTER_NAME IP_ADDR1_TEXT Number_of_Risks
Aug 14, 2015 ADBF8J2 10.90.0.52 2
Aug 14, 2015 AD25N10 10.51.0.80 1
Aug 14, 2015 N20C0F8J2 10.18.0.79 1
#sean-lange
I'm trying to create a flat excel file to input in Tableau . Each Site will be plotted on a map and if there are any risks, a hover-over will detail these.
A Site can have zero to many risks, hence the need for column with a table value, i.e. column with array value.
I am confused reading statements in "Why is 1899-12-30 the zero date in Access / SQL Server instead of 12/31?"
Was there date type in SQL Server before 2008 version? I cannot find.
In SQL Server 2008 zero date is 0001-01-01. Were there any date type before (in previous SQL Server versions) how is it backward compatible?
No. Prior to SQL Server 2008, there was only the datetime and smalldatetime types.
Data type...............Range..................................................................Accuracy
datetime..................January 1, 1753, through December 31, 9999.....3.33 milliseconds
smalldatetime..........January 1, 1900, through June 6, 2079............... 1 minute
Please see: Date and Time Data in SQL Server 2008, specifically the bit that says "Date/Time Data Types Introduced in SQL Server 2008"
In SQL Server datetime datatype the minimum date that can be stored is 1 Jan 1753.
However the datetimes are stored as numeric offsets to a base date of 1900-01-01 00:00:00.000
SELECT CAST(-0.25 AS DATETIME) /*1899-12-31 18:00:00.000*/
SELECT CAST(0 AS DATETIME) /*1900-01-01 00:00:00.000*/
SELECT CAST(0.25 AS DATETIME) /*1900-01-01 06:00:00.000*/
1899-12-30 has no significance in SQL Server.
SELECT CAST(CAST('1899-12-30' AS DATETIME) AS FLOAT) /*Returns -2*/