Why is this join taking so long? - sql-server

I have the following query that I am running on my database server but it takes about 30 seconds to run and I can't work out why this is.
SELECT *
FROM [dbo].[PackageInstance] AS packInst
INNER JOIN [dbo].[PackageDefinition] AS packageDef
ON packInst.[PackageDefinitionID] = packageDef.[PackageDefinitionID]
LEFT OUTER JOIN [dbo].[PackageInstanceContextDef] AS contextDef
ON packInst.[PackageInstanceID] = contextDef.[PackageInstanceID]
This produced the following execution plan which to me looks to be good....so I can't understand why it takes so much time to execute where the resulting data is only 100,000 records (which should be a walk in the park for SQL Server).
Any ideas what could be causing this long execution time?
I have looked at the query in Profiler to see what the stats where on it and they are as follows:
CPU - 4711
Reads - 744453
Writes - 9
Duration - 26329
The following are the table definitions:
CREATE TABLE [dbo].[PackageDefinition](
[PackageDefinitionID] [int] IDENTITY(1,1) NOT NULL,
[ts] [timestamp] NOT NULL,
[ProgramID] [int] NULL,
[VendorID] [int] NULL,
[PackageExecutionTypeID] [int] NULL,
[PackageDefinitionStatusID] [int] NOT NULL,
[IsInternal] [bit] NOT NULL,
[Name] [dbo].[D_Name] NOT NULL,
[Description] [dbo].[D_Description] NOT NULL,
[CreatedDate] [datetime] NOT NULL,
[PublishedDate] [datetime] NULL,
[OwnerUserGuid] [uniqueidentifier] NOT NULL,
[ProcessDefinitionMainID] [int] NULL,
[KeyInfoHtml] [nvarchar](max) NULL,
[DescriptionHtml] [nvarchar](max) NULL,
[WhatToExpectHtml] [nvarchar](max) NULL,
[BestPracticesHtml] [nvarchar](max) NULL,
[RecommendedJourneysHtml] [nvarchar](max) NULL,
[RequiresSLAAgreement] [bit] NOT NULL,
[SLAFileAssetID] [int] NULL,
[ImageDataID] [int] NULL,
[VideoHtml] [nvarchar](max) NULL,
[VideoAssetID] [int] NULL,
[UseMapCosts] [bit] NOT NULL,
[CostMin] [money] NOT NULL,
[CostMax] [money] NOT NULL,
[LandingPageVisitCount] [int] NOT NULL,
[IsDeleted] [dbo].[D_IsDeleted] NOT NULL,
[CreatedByUserGuid] [uniqueidentifier] NOT NULL,
[OrderHtml] [nvarchar](max) NULL,
CONSTRAINT [PK_PackageDefinition] PRIMARY KEY CLUSTERED
(
[PackageDefinitionID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [dbo].[PackageInstance](
[PackageInstanceID] [int] IDENTITY(1,1) NOT NULL,
[ts] [timestamp] NOT NULL,
[PackageDefinitionID] [int] NOT NULL,
[PackageStatusID] [int] NOT NULL,
[Name] [dbo].[D_Description] NOT NULL,
[CampaignID] [int] NULL,
[MarketingPlanID] [int] NULL,
[CountryID] [int] NULL,
[DateEntered] [datetime] NULL,
[DateExecuted] [datetime] NULL,
[ProcessID] [int] NULL,
[OrderedByUserGuid] [uniqueidentifier] NULL,
[RequestedByUserGuid] [uniqueidentifier] NULL,
[SLAEndDate] [datetime] NULL,
CONSTRAINT [PK_PackageInstance] PRIMARY KEY CLUSTERED
(
[PackageInstanceID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [dbo].[PackageInstanceContextDef](
[PackageInstanceContextDefID] [int] IDENTITY(1,1) NOT NULL,
[ts] [timestamp] NOT NULL,
[PackageInstanceID] [int] NOT NULL,
[ContextObjectDefID] [int] NOT NULL,
[EnteredFieldValue] [varchar](max) NULL,
[SelectedListValueID] [int] NULL,
[AssetIdsString] [nvarchar](max) NULL,
[SelectedListValueIdsString] [nvarchar](max) NULL,
[ContextObjectFieldName] [nvarchar](30) NOT NULL,
CONSTRAINT [PK_PackageInstanceContextDef] PRIMARY KEY CLUSTERED
(
[PackageInstanceContextDefID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

Remove the * in SELECT *
It will always scan because you ask for all columns. And do you have clustered indexes?

The answer turned out to be what #MartinSmith suggested. Because the PackageDefinition table contained about 8 NVARCHAR(MAX) columns, when the resulting join was created and that was over 100k rows, this was causing the varchar(max) values to be re-read over and over and they exist in out of row pages. Hence the large number of logical reads.
Thanks all for your support, just have to figure out to make the entity framework produce the query that I want.

What happens if you add the following index...
CREATE NONCLUSTERED INDEX ix ON PackageDefinition(PackageDefinitionID)
...and try the following to reduce the width of the data going into the sort?
SELECT packInst.*,
packageDef2.*,
contextDef.*
FROM [dbo].[PackageInstance] AS packInst
INNER MERGE JOIN [dbo].[PackageDefinition] AS packageDef
ON packInst.[PackageDefinitionID] = packageDef.[PackageDefinitionID]
LEFT OUTER MERGE JOIN [dbo].[PackageInstanceContextDef] AS contextDef
ON packInst.[PackageInstanceID] = contextDef.[PackageInstanceID]
INNER MERGE JOIN [dbo].[PackageDefinition] AS packageDef2
ON packageDef.[PackageDefinitionID] = packageDef2.[PackageDefinitionID]
OF course * should not be used as even if you need all columns you definitely won't need the same columns twice as the result of the JOIN but this is just to maintain the semantics of your original query.

Related

Is Correct to group all general settings in one table or create table for each one?

I have a table with 9 columns sharing same data (GN_PLACE_ID,GN_AR_NAME,GN_EN_NAME)
is good why to save them in one table or create table for each one ?
CREATE TABLE [dbo].[GENERAL](
[GN_ID] [int] IDENTITY(1,1) NOT NULL,
[GN_PLACE_ID] [int] NULL,
[GN_AR_NAME] [nvarchar](1000) NULL,
[GN_EN_NAME] [nvarchar](1000) NULL,
[GN_IS_DEVICE_TYPE] [bit] NULL,
[GN_IS_ISSUE_TYPE] [bit] NULL,
[GN_IS_REPAIR_TYPE] [bit] NULL,
[GN_IS_CANCEL_REASON] [bit] NULL,
[GN_IS_SUSPEND_REASON] [bit] NULL,
[GN_IS_MANUFACTURER] [bit] NULL,
[GN_IS_DEPARTMENT] [bit] NULL,
[GN_IS_PRIORITY] [bit] NULL,
[GN_IS_SUPPORT_SECTION] [bit] NULL,
CONSTRAINT [PK_GENERAL] PRIMARY KEY CLUSTERED
(
[GN_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
thanks

Msg 8152 String or binary data would be truncated

I know this has been asked before, but none of the answers are fixing the issue, need another set of eyes. I'm trying to create a table and seed it with some values. I don't have any value constraints so I'm not sure why I'm getting the error I'm getting.
CREATE TABLE [dbo].[AvailableTime]
(
[Id] [INT] IDENTITY(1,1) NOT NULL,
[TimeString] [NCHAR] NOT NULL,
[TimeValue] [NCHAR] NOT NULL,
[CompanyId] [INT] NOT NULL,
[CompanyName] [NVARCHAR](MAX) NULL,
[LocationId] [INT] NOT NULL,
[LocationName] [NVARCHAR] NOT NULL,
[IsClaimed] [BIT] NOT NULL,
CONSTRAINT [PK_AvailableTime]
PRIMARY KEY CLUSTERED ([Id] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
INSERT INTO dbo.AvailableTime (TimeString, TimeValue, CompanyId, CompanyName,
LocationId, LocationName, IsClaimed)
VALUES ('2019-01-07T00:00:00', '12:00 AM', 1, 'Company',
2, 'Inver Grove', 'FALSE');
You should never specify a CHAR, VARCHAR, NCHAR or NVARCHAR column (or variable, or parameter) without specifying an explicit length!
If you omit the specific length, in some cases you'll end up with variable or column of exactly ONE character of length! That's typically NOT what you want!
ALSO: why are you storing TimeString and TimeValue as NCHAR?? Doesn't make any sense - use the most appropriate datatype - and here, that would be DATETIME2(n) and TIME.
So define your table like this:
CREATE TABLE [dbo].[AvailableTime]
(
[Id] [INT] IDENTITY(1,1) NOT NULL,
[TimeString] DATETIM2(0) NOT NULL,
[TimeValue] TIME(0) NOT NULL,
[CompanyId] [INT] NOT NULL,
[CompanyName] [NVARCHAR](MAX) NULL,
[LocationId] [INT] NOT NULL,
[LocationName] [NVARCHAR](100) NOT NULL,
[IsClaimed] [BIT] NOT NULL,
and you should be fine.
You need declare the size of your CHAR & VARCHAR fields in the table:
CREATE TABLE [dbo].[AvailableTime]
(
[Id] [INT] IDENTITY(1,1) NOT NULL,
[TimeString] [NCHAR](50) NOT NULL,
[TimeValue] [NCHAR](50) NOT NULL,
[CompanyId] [INT] NOT NULL,
[CompanyName] [NVARCHAR](MAX) NULL,
[LocationId] [INT] NOT NULL,
[LocationName] [NVARCHAR](50) NOT NULL,
[IsClaimed] [BIT] NOT NULL,
CONSTRAINT [PK_AvailableTime]
PRIMARY KEY CLUSTERED ([Id] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
You have a error in insert query, mismatch types in last column.
INSERT INTO dbo.AvailableTime (TimeString, TimeValue, CompanyId, CompanyName,
LocationId, LocationName, IsClaimed)
VALUES ('2019-01-07T00:00:00', '12:00 AM', 1, 'Company', 2, 'Inver Grove', 0);

How do I speed up an insert that deals with huge amount of data and non indexed joins?

I have the following query that will be used to fetch data from legacy tables. It's no surprise but the amount of data is huge and thus it takes a long time. The first select takes 40 minutes to run using an empty dbo.commodities_copy table as a starting point and yields around 26,000 rows. Keep in mind that there are separate databases: STAGING and PRESTAGING and that some joins are made using non-PK fields, which is most definately making an impact in its performance. This is something that I cannot fix, due to the way data was organized from the start. Also the transaction table has around 1 million rows, which also impacts heavily on performance. The entire script takes a total of 3.5 hours to execute when using an EMPTY dbo.commodities_copy table. I have not tested on insertion to a table with data.
The goal of the query is to get commodity information from the transaction table (if you guessed this was supposed to be noSQL data, you guessed right) and if the commodity code exists in the commodity table, do not insert a commodity in it.
The group bys are absolutely needed to get around duplicates, since a transactions may share the same commodity. The commodity code should be unique in the commodities table, but currently it is not - though if it helps, it's possible we could alter it.
What can I do to speed it up?
INSERT INTO STAGING.dbo.commodities_copy
(commodity_code,
short_description_sched_b,
short_description_sched_tsusa,
long_description_sched_b,
long_description_sched_tsusa,
measurement_unit_1_sched_b,
measurement_unit_1_sched_tsusa,
measurement_unit_2_sched_b,
measurement_unit_2_sched_tsusa,
end_use_sched_b,
end_use_sched_tsusa,
year,
created_by,
created_on,
taxable_sched_b,
taxable_sched_tsusa,
non_taxable_sched_b,
non_taxable_sched_tsusa,
fk_sic_sched_b,
fk_sic_sched_tsusa,
chapter,
header,
sub_header,
needs_validation)
SELECT
--Distinct
Commodity_Code,
iif(miob2.DESC_COMM is null, UPPER(socrata.Commodity_Short_Name), miob2.DESC_COMM) as short_commmodity_description_b,
iif(mio2tsusa.DESC_COMM is null, UPPER(socrata.Commodity_Short_Name), mio2tsusa.DESC_COMM) as short_commmodity_description_tsusa,
socrata.Commodity_description as long_commodity_description_b,
socrata.Commodity_description as long_commodity_description_tsusa,
iif(miob2.UNIDAD is null, socrata.unit_1, miob2.UNIDAD) as unit_1_b,
iif(mio2tsusa.UNIDAD is null, socrata.unit_1, mio2tsusa.UNIDAD) as unit_1_tsusa,
MAX(socrata.unit_2) as unit_2_b,
MAX(socrata.unit_2) as unit_2_tsusa,
socrata.end_use_e as end_use_b,
socrata.end_use_i as end_use_tsusa,
MAX(socrata.[year]),
'system' as created_by,
getdate() as created_on,
miob.TRIBUTA as taxable_b,
miotsusa.TRIBUTA as taxable_tsusa,
miob.NTRIBUTA as non_taxable_b,
miotsusa.NTRIBUTA as non_taxable_tsusa,
sicb.id as sic_id_b,
sictsusa.id as sic_id_tsusa,
SUBSTRING(Commodity_Code, 1, 2) as chapter,
SUBSTRING(Commodity_Code, 1, 4) as header,
SUBSTRING(Commodity_Code, 1, 6) as sub_header,
0 as needs_validation
FROM PRE_STAGING.dbo.TRANSACTIONS_FROM_SOCRATA socrata
Left join PRE_STAGING.DBO.MIOB_TBL miob ON miob.COMM=socrata.Commodity_Code
Left join PRE_STAGING.dbo.MSCHB_TBL miob2 ON miob2.COMM=socrata.Commodity_Code
Left join PRE_STAGING.dbo.MIOTSUSA_TBL miotsusa ON miotsusa.COMM=socrata.Commodity_Code
Left join PRE_STAGING.dbo.MTSUSA_TBL mio2tsusa ON mio2tsusa.COMM=socrata.Commodity_Code
Left join STAGING.dbo.sics_altered sicb ON sicb.sic_code = miob.SIC
Left join STAGING.dbo.sics_altered sictsusa ON sictsusa.sic_code = miotsusa.SIC
WHERE NOT EXISTS
(Select Distinct commodity_code from STAGING.dbo.commodities_copy)
group by
Commodity_Code,
iif(miob2.DESC_COMM is null, UPPER(socrata.Commodity_Short_Name), miob2.DESC_COMM),
iif(mio2tsusa.DESC_COMM is null, UPPER(socrata.Commodity_Short_Name), mio2tsusa.DESC_COMM),
socrata.Commodity_description,
socrata.Commodity_description,
iif(miob2.UNIDAD is null, socrata.unit_1, miob2.UNIDAD),
iif(mio2tsusa.UNIDAD is null, socrata.unit_1, mio2tsusa.UNIDAD),
socrata.end_use_e,
socrata.end_use_i,
miob.TRIBUTA,
miotsusa.TRIBUTA,
miob.NTRIBUTA,
miotsusa.NTRIBUTA,
sicb.id,
sictsusa.id,
SUBSTRING(Commodity_Code, 1, 2),
SUBSTRING(Commodity_Code, 1, 4),
SUBSTRING(Commodity_Code, 1, 6)
The tables used are the following:
STAGING.dbo.commodities_copy:
CREATE TABLE [dbo].[commodities_copy](
[id] [bigint] IDENTITY(1,1) NOT NULL,
[chapter] [varchar](5) NULL,
[header] [varchar](5) NULL,
[sub_header] [varchar](10) NULL,
[commodity_code] [varchar](20) NULL,
[short_description_sched_b] [varchar](100) NULL,
[long_description_sched_b] [varchar](200) NULL,
[measurement_unit_1_sched_b] [varchar](5) NULL,
[measurement_unit_2_sched_b] [varchar](5) NULL,
[end_use_sched_b] [int] NULL,
[sitc_sched_b] [varchar](20) NULL,
[usda_sched_b] [int] NULL,
[hitech_sched_b] [int] NULL,
[naics_fk_id_sched_b] [bigint] NULL,
[short_description_sched_tsusa] [varchar](100) NULL,
[long_description_sched_tsusa] [varchar](200) NULL,
[measurement_unit_1_sched_tsusa] [varchar](5) NULL,
[measurement_unit_2_sched_tsusa] [varchar](5) NULL,
[end_use_sched_tsusa] [int] NULL,
[sitc_sched_tsusa] [varchar](20) NULL,
[usda_sched_tsusa] [int] NULL,
[hitech_sched_tsusa] [int] NULL,
[naics_fk_id_sched_tsusa] [bigint] NULL,
[year] [int] NOT NULL,
[created_on] [datetime] NOT NULL,
[created_by] [varchar](50) NULL,
[updated_on] [datetime] NULL,
[updated_by] [varchar](50) NULL,
[needs_validation] [bit] NOT NULL,
[taxable_sched_b] [nchar](3) NULL,
[non_taxable_sched_b] [nchar](3) NULL,
[taxable_sched_tsusa] [nchar](3) NULL,
[non_taxable_sched_tsusa] [nchar](3) NULL,
[fk_sic_sched_b] [bigint] NULL,
[fk_sic_sched_tsusa] [bigint] NULL
) ON [PRIMARY]
STAGING.dbo.sics_altered:
CREATE TABLE [dbo].[sics_altered](
[id] [bigint] IDENTITY(1,1) NOT NULL,
[sic_code] [varchar](4) NULL,
[sic_description] [varchar](max) NULL,
[created_on] [datetime] NOT NULL,
[created_by] [varchar](50) NOT NULL,
PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
The rest are in PRESTAGING:
PRESTAGING.dbo.TRANSACTIONS_FROM_SOCRATA:
This is the table with 1.3 million rows
CREATE TABLE [dbo].[TRANSACTIONS_FROM_SOCRATA](
[Trade] [varchar](255) NULL,
[Year] [varchar](255) NULL,
[Month] [varchar](50) NULL,
[Commodity_Code] [varchar](50) NULL,
[Commodity_Short_Name] [varchar](255) NULL,
[Commodity_description] [varchar](255) NULL,
[cty_code] [varchar](50) NULL,
[Country] [varchar](50) NULL,
[Subcountry_code] [varchar](50) NULL,
[district] [varchar](50) NULL,
[dist_name] [varchar](255) NULL,
[data] [varchar](50) NULL,
[sitc] [varchar](50) NULL,
[SITC_Short_Desc] [varchar](255) NULL,
[SITC_Long_Desc] [varchar](255) NULL,
[naics] [varchar](50) NULL,
[NAICS_description] [varchar](255) NULL,
[end_use_i] [varchar](50) NULL,
[end_use_e] [varchar](50) NULL,
[hts_desc] [varchar](255) NULL,
[unit_1] [varchar](50) NULL,
[qty_1] [varchar](50) NULL,
[unit_2] [varchar](50) NULL,
[qty_2] [varchar](50) NULL,
[ves_val_mo] [varchar](50) NULL,
[ves_wgt_mo] [varchar](50) NULL,
[cards_mo] [varchar](50) NULL,
[air_val_mo] [varchar](50) NULL,
[air_wgt_mo] [varchar](50) NULL,
[dut_val_mo] [varchar](50) NULL,
[cal_dut_mo] [varchar](50) NULL,
[con_cha_mo] [varchar](50) NULL,
[con_cif_mo] [varchar](50) NULL,
[gen_val_mo] [varchar](50) NULL,
[gen_cha_mo] [varchar](50) NULL,
[gen_cif_mo] [varchar](50) NULL,
[air_cha_mo] [varchar](50) NULL,
[ves_cha_mo] [varchar](50) NULL,
[cnt_cha_mo] [varchar](50) NULL,
[rev_data] [varchar](50) NULL
) ON [PRIMARY]
PRESTAGING.dbo.MIOB_TBL:
CREATE TABLE [dbo].[MIOB_TBL](
[id] [int] IDENTITY(1,1) NOT NULL,
[COMM] [nchar](10) NOT NULL,
[INSUMO] [nchar](3) NULL,
[PBTO] [nchar](4) NULL,
[SIC] [nchar](4) NULL,
[NAICS] [nchar](6) NULL,
[TRIBUTA] [nchar](3) NULL,
[NTRIBUTA] [nchar](3) NULL,
[LAST_UPDATE] [date] NULL,
[LAST_UPDATED_BY] [nchar](20) NULL,
[CREATION_DATE] [date] NULL,
[CREATED_BY] [nchar](15) NULL,
[migrated_on] [datetime] NOT NULL,
PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
PRESTAGING.dbo.MIOTSUSA_TBL:
CREATE TABLE [dbo].[MIOTSUSA_TBL](
[COMM] [nchar](10) NOT NULL,
[INSUMO] [nchar](3) NULL,
[PBTO] [nchar](4) NULL,
[SIC] [nchar](4) NULL,
[NAICS] [nchar](6) NULL,
[TRIBUTA] [nchar](3) NULL,
[NTRIBUTA] [nchar](3) NULL,
[id] [int] IDENTITY(1,1) NOT NULL,
[migrated_on] [datetime] NOT NULL
) ON [PRIMARY]
PRESTAGING.dbo.MSCHB_TBL:
CREATE TABLE [dbo].[MSCHB_TBL](
[id] [int] IDENTITY(1,1) NOT NULL,
[COMM] [nchar](10) NOT NULL,
[DESC_COMM] [nchar](50) NULL,
[UNIDAD] [nchar](3) NULL,
[LAST_UPDATE] [date] NULL,
[LAST_UPDATED_BY] [nchar](20) NULL,
[CREATION_DATE] [date] NULL,
[CREATED_BY] [nchar](15) NULL,
[migrated_on] [datetime] NOT NULL,
PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
PRESTAGING.dbo.MTSUSA_TBL
CREATE TABLE [dbo].[MTSUSA_TBL](
[COMM] [nchar](10) NOT NULL,
[DESC_COMM] [nchar](50) NULL,
[UNIDAD] [nchar](3) NULL,
[id] [int] IDENTITY(1,1) NOT NULL,
[migrated_on] [datetime] NOT NULL
) ON [PRIMARY]
Let me know if there's anything else I need to provide.
With all those left outer joins, the query optimizer has to start with TRANSACTION_FROM_SOCRATA, so I would start with that. The only filtering is the NOT IN clause--would that cut down the 1MM rows to something more reasonable? If not, you're pretty much doomed to running at least one table scan (and possibly several) on the entire table.
If filtering on Commodity_Code would significantly cut things down, that can only be done if the column is indexed, so that SQL can find and read only those rows. It can only do that if there is an index on column--otherwise you're back to a table scan. Similarly, having an index on commodity_code in table commodities_copy` would help as well, if that table is large.
As discussed in the comments, a NOT EXISTS check would be most efficient, written as a correlated subquery:
WHERE NOT EXISTS (select commodity_code
from STAGING.dbo.commodities_copy
where commodity_code = socrtata.Commodity_Code)
(I'd want to do a lot of testing on this, checking and double-checking everything. Improving performance is tricky, doubly so when done through SO.)
Try this,
create table #socrata(Commodity_Code varchar(100),unit_2_b varchar(50),unit_2_tsusa varchar(50),[year] varchar(50))
insert into #socrata
SELECT
Commodity_Code,
MAX(socrata.unit_2) as unit_2_b,
MAX(socrata.unit_2) as unit_2_tsusa,
MAX(socrata.[year]),
FROM PRE_STAGING.dbo.TRANSACTIONS_FROM_SOCRATA socrata
group by Commodity_Code
SELECT
--Distinct
Commodity_Code,
iif(miob2.DESC_COMM is null, UPPER(socrata.Commodity_Short_Name), miob2.DESC_COMM) as short_commmodity_description_b,
iif(mio2tsusa.DESC_COMM is null, UPPER(socrata.Commodity_Short_Name), mio2tsusa.DESC_COMM) as short_commmodity_description_tsusa,
socrata.Commodity_description as long_commodity_description_b,
socrata.Commodity_description as long_commodity_description_tsusa,
iif(miob2.UNIDAD is null, socrata.unit_1, miob2.UNIDAD) as unit_1_b,
iif(mio2tsusa.UNIDAD is null, socrata.unit_1, mio2tsusa.UNIDAD) as unit_1_tsusa,
unit_2_b,
unit_2_tsusa,
socrata.end_use_e as end_use_b,
socrata.end_use_i as end_use_tsusa,
[year],
'system' as created_by,
getdate() as created_on,
miob.TRIBUTA as taxable_b,
miotsusa.TRIBUTA as taxable_tsusa,
miob.NTRIBUTA as non_taxable_b,
miotsusa.NTRIBUTA as non_taxable_tsusa,
sicb.id as sic_id_b,
sictsusa.id as sic_id_tsusa,
SUBSTRING(Commodity_Code, 1, 2) as chapter,
SUBSTRING(Commodity_Code, 1, 4) as header,
SUBSTRING(Commodity_Code, 1, 6) as sub_header,
0 as needs_validation
FROM #socrata socrata
Left join PRE_STAGING.DBO.MIOB_TBL miob ON miob.COMM=socrata.Commodity_Code
Left join PRE_STAGING.dbo.MSCHB_TBL miob2 ON miob2.COMM=socrata.Commodity_Code
Left join PRE_STAGING.dbo.MIOTSUSA_TBL miotsusa ON miotsusa.COMM=socrata.Commodity_Code
Left join PRE_STAGING.dbo.MTSUSA_TBL mio2tsusa ON mio2tsusa.COMM=socrata.Commodity_Code
Left join STAGING.dbo.sics_altered sicb ON sicb.sic_code = miob.SIC
Left join STAGING.dbo.sics_altered sictsusa ON sictsusa.sic_code = miotsusa.SIC
WHERE NOT EXISTS
(Select commodity_code from STAGING.dbo.commodities_copy where commodity_code = socrtata.Commodity_Code)
if Read uncommitted data is not a concern then you can use with (nolock)
Also your exists clause was wrong and no need of distinct.check rest of the changes.

SQL Server DELETE performance issues

I have a table with approximately 1 million rows. Part of our maintenance involves deleting old row each day, but this is taking about 40 minutes.
The delete statement is:
DELETE
FROM [dbGlobalPricingMatrix].[dbo].[tblPricing]
WHERE type = 'car'
AND capid NOT IN
(SELECT cder_id FROM PUB_CAR.dbo.CapDer WHERE cder_discontinued
IS NULL OR DATEDIFF(dd,cder_discontinued,GETDATE()) <= 7)
AND source = #source
Is there anything I can do to improve the performance?
Thanks
As Requested:
CREATE TABLE [dbo].[tblPricing](
[id] [int] IDENTITY(1,1) NOT NULL,
[type] [varchar](50) NULL,
[capid] [int] NULL,
[source] [varchar](50) NULL,
[product] [varchar](50) NULL,
[term] [int] NULL,
[milespa] [int] NULL,
[maintained] [bit] NULL,
[price] [money] NULL,
[created] [datetime] NULL,
[updated] [datetime] NULL,
[notes] [varchar](1000) NULL,
[painttype] [char](1) NULL,
[activeflag] [bit] NULL,
[DealerId] [int] NULL,
[FunderId] [int] NULL,
[IsSpecial] [bit] NULL,
[username] [varchar](50) NULL,
[expiry] [datetime] NULL,
CONSTRAINT [PK_tblPricing] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [dbo].[CAPDer](
[cder_ID] [int] NOT NULL,
[cder_capcode] [char](20) NULL,
[cder_mancode] [int] NULL,
[cder_rancode] [int] NULL,
[cder_modcode] [int] NULL,
[cder_trimcode] [int] NULL,
[cder_name] [varchar](50) NULL,
[cder_introduced] [datetime] NULL,
[cder_discontinued] [datetime] NULL,
[cder_orderno] [int] NULL,
[cder_vehiclesector] [tinyint] NULL,
[cder_doors] [tinyint] NULL,
[cder_drivetrain] [char](1) NULL,
[cder_fueldelivery] [char](1) NULL,
[cder_transmission] [char](1) NULL,
[cder_fueltype] [char](1) NULL,
CONSTRAINT [PK_CapDer] PRIMARY KEY CLUSTERED
(
[cder_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
Okay, as I suspected, you do not seem to have indexes for the fields that you reference in your DELETE query. So, add indexes for type, capid, cder_discontinued, and source.
Additionally, you might want to try AND capid IN (SELECT cder_id FROM PUB_CAR.dbo.CapDer WHERE cder_discontinued IS NOT NULL AND DATEDIFF(dd,cder_discontinued,GETDATE()) > 7). The optimizer of MS-SQL-Server should actually be doing this for you, but you never know, it is worth trying.

Query slow after execute DBCC DROPCLEANBUFFERS

I have 2 tables Called "Users(60,0000 records),UserBasicInfo(60,0000 records)",The Users table have a clustered index on UsersID column.The UserBasicInfo table have a UsersID FK point to Users table UsersID column
and have Non-clustered on UpdateTime,UsersID column.
CREATE TABLE [dbo].[Users](
[UsersID] [int] IDENTITY(100000,1) NOT NULL,
[LoginUsersName] [nvarchar](50) NOT NULL,
[LoginUsersPwd] [nvarchar](50) NOT NULL,
[Email] [nvarchar](80) NOT NULL,
[IsEnable] [int] NOT NULL,
[CreateTime] [datetime] NOT NULL,
[LastLoginTime] [datetime] NOT NULL,
[LastLoginIp] [nvarchar](50) NOT NULL,
[UpdateTime] [datetime] NOT NULL,
CONSTRAINT [PK_Users] PRIMARY KEY CLUSTERED
(
[UsersID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [dbo].[UserBasicInfo](
[UserBaicInfoID] [int] IDENTITY(1,1) NOT NULL,
[UsersID] [int] NOT NULL,
[ResumePoints] [nvarchar](50) NOT NULL,
[IsChineseOrEnglish] [int] NOT NULL,
[UserName] [nvarchar](50) NOT NULL,
[Sex] [int] NOT NULL,
[Height] [int] NOT NULL,
[Birthday] [datetime] NOT NULL,
[Age] [int] NOT NULL,
[IDCard] [nvarchar](50) NOT NULL,
[IsMarryed] [int] NOT NULL,
[NativePlace] [nvarchar](50) NOT NULL,
[PoliticalStatus] [int] NOT NULL,
[CurrentAddress] [nvarchar](50) NOT NULL,
[CurrentAddressDetail] [nvarchar](50) NOT NULL,
[WorkExperience] [int] NOT NULL,
[HighestEducation] [int] NOT NULL,
[LogoPath] [nvarchar](200) NULL,
[MobilePhone] [nvarchar](50) NOT NULL,
[Phone] [nvarchar](50) NULL,
[QQ] [nvarchar](50) NULL,
[Blog] [nvarchar](300) NULL,
[MicroBlog] [nvarchar](300) NULL,
[PositionDesired] [nvarchar](100) NOT NULL,
[IndustrySmallClass] [nvarchar](100) NULL,
[PositionName] [nvarchar](100) NOT NULL,
[PositionType] [int] NOT NULL,
[WorkAddressLarge] [int] NOT NULL,
[WorkAddressSmall] [nvarchar](200) NOT NULL,
[WorkAddressSmallText] [nvarchar](100) NOT NULL,
[Salary] [int] NOT NULL,
[HousingRequirement] [int] NOT NULL,
[ToWorkTime] [int] NOT NULL,
[ResumeState] [int] NOT NULL,
[IsSystemAdd] [int] NOT NULL,
[CreateTime] [datetime] NOT NULL,
[UpdateTime] [datetime] NOT NULL,
[RefreshDateTime] [datetime] NOT NULL,
[RefreshTime] [int] NOT NULL,
[TotalTime] [int] NOT NULL,
CONSTRAINT [PK_UserBasicInfo] PRIMARY KEY CLUSTERED
(
[UserBaicInfoID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
ALTER TABLE [dbo].[UserBasicInfo] WITH CHECK ADD CONSTRAINT [FK_UserBasicInfo_Users] FOREIGN KEY([UsersID])
REFERENCES [dbo].[Users] ([UsersID])
But execute below slow:
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
SELECT TOP 100 * FROM Users U INNER JOIN UserBasicInfo UB ON UB.UsersID=U.UsersID ORDER BY UB.UpdateTime DESC
SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 3568 ms.
Then execute below fast:
SELECT TOP 100 * FROM Users U INNER JOIN UserBasicInfo UB ON UB.UsersID=U.UsersID ORDER BY UB.UpdateTime DESC
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 163 ms.
It is normal?
Yes, it is normal
Your 2 DBCC commands ahve removed query plans and cached data. Data has to be read from disk again which is probably the main overhead.
If you add SET STATISTICS IO ON you'll see more "physical page reads" and "read-ahead reads" after the DBCC as data is loaded from disk
I almost never run them. For more, see these dba.se questions
https://dba.stackexchange.com/a/10820/630
https://dba.stackexchange.com/q/7859/630

Resources