Related
I have the following query that will be used to fetch data from legacy tables. It's no surprise but the amount of data is huge and thus it takes a long time. The first select takes 40 minutes to run using an empty dbo.commodities_copy table as a starting point and yields around 26,000 rows. Keep in mind that there are separate databases: STAGING and PRESTAGING and that some joins are made using non-PK fields, which is most definately making an impact in its performance. This is something that I cannot fix, due to the way data was organized from the start. Also the transaction table has around 1 million rows, which also impacts heavily on performance. The entire script takes a total of 3.5 hours to execute when using an EMPTY dbo.commodities_copy table. I have not tested on insertion to a table with data.
The goal of the query is to get commodity information from the transaction table (if you guessed this was supposed to be noSQL data, you guessed right) and if the commodity code exists in the commodity table, do not insert a commodity in it.
The group bys are absolutely needed to get around duplicates, since a transactions may share the same commodity. The commodity code should be unique in the commodities table, but currently it is not - though if it helps, it's possible we could alter it.
What can I do to speed it up?
INSERT INTO STAGING.dbo.commodities_copy
(commodity_code,
short_description_sched_b,
short_description_sched_tsusa,
long_description_sched_b,
long_description_sched_tsusa,
measurement_unit_1_sched_b,
measurement_unit_1_sched_tsusa,
measurement_unit_2_sched_b,
measurement_unit_2_sched_tsusa,
end_use_sched_b,
end_use_sched_tsusa,
year,
created_by,
created_on,
taxable_sched_b,
taxable_sched_tsusa,
non_taxable_sched_b,
non_taxable_sched_tsusa,
fk_sic_sched_b,
fk_sic_sched_tsusa,
chapter,
header,
sub_header,
needs_validation)
SELECT
--Distinct
Commodity_Code,
iif(miob2.DESC_COMM is null, UPPER(socrata.Commodity_Short_Name), miob2.DESC_COMM) as short_commmodity_description_b,
iif(mio2tsusa.DESC_COMM is null, UPPER(socrata.Commodity_Short_Name), mio2tsusa.DESC_COMM) as short_commmodity_description_tsusa,
socrata.Commodity_description as long_commodity_description_b,
socrata.Commodity_description as long_commodity_description_tsusa,
iif(miob2.UNIDAD is null, socrata.unit_1, miob2.UNIDAD) as unit_1_b,
iif(mio2tsusa.UNIDAD is null, socrata.unit_1, mio2tsusa.UNIDAD) as unit_1_tsusa,
MAX(socrata.unit_2) as unit_2_b,
MAX(socrata.unit_2) as unit_2_tsusa,
socrata.end_use_e as end_use_b,
socrata.end_use_i as end_use_tsusa,
MAX(socrata.[year]),
'system' as created_by,
getdate() as created_on,
miob.TRIBUTA as taxable_b,
miotsusa.TRIBUTA as taxable_tsusa,
miob.NTRIBUTA as non_taxable_b,
miotsusa.NTRIBUTA as non_taxable_tsusa,
sicb.id as sic_id_b,
sictsusa.id as sic_id_tsusa,
SUBSTRING(Commodity_Code, 1, 2) as chapter,
SUBSTRING(Commodity_Code, 1, 4) as header,
SUBSTRING(Commodity_Code, 1, 6) as sub_header,
0 as needs_validation
FROM PRE_STAGING.dbo.TRANSACTIONS_FROM_SOCRATA socrata
Left join PRE_STAGING.DBO.MIOB_TBL miob ON miob.COMM=socrata.Commodity_Code
Left join PRE_STAGING.dbo.MSCHB_TBL miob2 ON miob2.COMM=socrata.Commodity_Code
Left join PRE_STAGING.dbo.MIOTSUSA_TBL miotsusa ON miotsusa.COMM=socrata.Commodity_Code
Left join PRE_STAGING.dbo.MTSUSA_TBL mio2tsusa ON mio2tsusa.COMM=socrata.Commodity_Code
Left join STAGING.dbo.sics_altered sicb ON sicb.sic_code = miob.SIC
Left join STAGING.dbo.sics_altered sictsusa ON sictsusa.sic_code = miotsusa.SIC
WHERE NOT EXISTS
(Select Distinct commodity_code from STAGING.dbo.commodities_copy)
group by
Commodity_Code,
iif(miob2.DESC_COMM is null, UPPER(socrata.Commodity_Short_Name), miob2.DESC_COMM),
iif(mio2tsusa.DESC_COMM is null, UPPER(socrata.Commodity_Short_Name), mio2tsusa.DESC_COMM),
socrata.Commodity_description,
socrata.Commodity_description,
iif(miob2.UNIDAD is null, socrata.unit_1, miob2.UNIDAD),
iif(mio2tsusa.UNIDAD is null, socrata.unit_1, mio2tsusa.UNIDAD),
socrata.end_use_e,
socrata.end_use_i,
miob.TRIBUTA,
miotsusa.TRIBUTA,
miob.NTRIBUTA,
miotsusa.NTRIBUTA,
sicb.id,
sictsusa.id,
SUBSTRING(Commodity_Code, 1, 2),
SUBSTRING(Commodity_Code, 1, 4),
SUBSTRING(Commodity_Code, 1, 6)
The tables used are the following:
STAGING.dbo.commodities_copy:
CREATE TABLE [dbo].[commodities_copy](
[id] [bigint] IDENTITY(1,1) NOT NULL,
[chapter] [varchar](5) NULL,
[header] [varchar](5) NULL,
[sub_header] [varchar](10) NULL,
[commodity_code] [varchar](20) NULL,
[short_description_sched_b] [varchar](100) NULL,
[long_description_sched_b] [varchar](200) NULL,
[measurement_unit_1_sched_b] [varchar](5) NULL,
[measurement_unit_2_sched_b] [varchar](5) NULL,
[end_use_sched_b] [int] NULL,
[sitc_sched_b] [varchar](20) NULL,
[usda_sched_b] [int] NULL,
[hitech_sched_b] [int] NULL,
[naics_fk_id_sched_b] [bigint] NULL,
[short_description_sched_tsusa] [varchar](100) NULL,
[long_description_sched_tsusa] [varchar](200) NULL,
[measurement_unit_1_sched_tsusa] [varchar](5) NULL,
[measurement_unit_2_sched_tsusa] [varchar](5) NULL,
[end_use_sched_tsusa] [int] NULL,
[sitc_sched_tsusa] [varchar](20) NULL,
[usda_sched_tsusa] [int] NULL,
[hitech_sched_tsusa] [int] NULL,
[naics_fk_id_sched_tsusa] [bigint] NULL,
[year] [int] NOT NULL,
[created_on] [datetime] NOT NULL,
[created_by] [varchar](50) NULL,
[updated_on] [datetime] NULL,
[updated_by] [varchar](50) NULL,
[needs_validation] [bit] NOT NULL,
[taxable_sched_b] [nchar](3) NULL,
[non_taxable_sched_b] [nchar](3) NULL,
[taxable_sched_tsusa] [nchar](3) NULL,
[non_taxable_sched_tsusa] [nchar](3) NULL,
[fk_sic_sched_b] [bigint] NULL,
[fk_sic_sched_tsusa] [bigint] NULL
) ON [PRIMARY]
STAGING.dbo.sics_altered:
CREATE TABLE [dbo].[sics_altered](
[id] [bigint] IDENTITY(1,1) NOT NULL,
[sic_code] [varchar](4) NULL,
[sic_description] [varchar](max) NULL,
[created_on] [datetime] NOT NULL,
[created_by] [varchar](50) NOT NULL,
PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
The rest are in PRESTAGING:
PRESTAGING.dbo.TRANSACTIONS_FROM_SOCRATA:
This is the table with 1.3 million rows
CREATE TABLE [dbo].[TRANSACTIONS_FROM_SOCRATA](
[Trade] [varchar](255) NULL,
[Year] [varchar](255) NULL,
[Month] [varchar](50) NULL,
[Commodity_Code] [varchar](50) NULL,
[Commodity_Short_Name] [varchar](255) NULL,
[Commodity_description] [varchar](255) NULL,
[cty_code] [varchar](50) NULL,
[Country] [varchar](50) NULL,
[Subcountry_code] [varchar](50) NULL,
[district] [varchar](50) NULL,
[dist_name] [varchar](255) NULL,
[data] [varchar](50) NULL,
[sitc] [varchar](50) NULL,
[SITC_Short_Desc] [varchar](255) NULL,
[SITC_Long_Desc] [varchar](255) NULL,
[naics] [varchar](50) NULL,
[NAICS_description] [varchar](255) NULL,
[end_use_i] [varchar](50) NULL,
[end_use_e] [varchar](50) NULL,
[hts_desc] [varchar](255) NULL,
[unit_1] [varchar](50) NULL,
[qty_1] [varchar](50) NULL,
[unit_2] [varchar](50) NULL,
[qty_2] [varchar](50) NULL,
[ves_val_mo] [varchar](50) NULL,
[ves_wgt_mo] [varchar](50) NULL,
[cards_mo] [varchar](50) NULL,
[air_val_mo] [varchar](50) NULL,
[air_wgt_mo] [varchar](50) NULL,
[dut_val_mo] [varchar](50) NULL,
[cal_dut_mo] [varchar](50) NULL,
[con_cha_mo] [varchar](50) NULL,
[con_cif_mo] [varchar](50) NULL,
[gen_val_mo] [varchar](50) NULL,
[gen_cha_mo] [varchar](50) NULL,
[gen_cif_mo] [varchar](50) NULL,
[air_cha_mo] [varchar](50) NULL,
[ves_cha_mo] [varchar](50) NULL,
[cnt_cha_mo] [varchar](50) NULL,
[rev_data] [varchar](50) NULL
) ON [PRIMARY]
PRESTAGING.dbo.MIOB_TBL:
CREATE TABLE [dbo].[MIOB_TBL](
[id] [int] IDENTITY(1,1) NOT NULL,
[COMM] [nchar](10) NOT NULL,
[INSUMO] [nchar](3) NULL,
[PBTO] [nchar](4) NULL,
[SIC] [nchar](4) NULL,
[NAICS] [nchar](6) NULL,
[TRIBUTA] [nchar](3) NULL,
[NTRIBUTA] [nchar](3) NULL,
[LAST_UPDATE] [date] NULL,
[LAST_UPDATED_BY] [nchar](20) NULL,
[CREATION_DATE] [date] NULL,
[CREATED_BY] [nchar](15) NULL,
[migrated_on] [datetime] NOT NULL,
PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
PRESTAGING.dbo.MIOTSUSA_TBL:
CREATE TABLE [dbo].[MIOTSUSA_TBL](
[COMM] [nchar](10) NOT NULL,
[INSUMO] [nchar](3) NULL,
[PBTO] [nchar](4) NULL,
[SIC] [nchar](4) NULL,
[NAICS] [nchar](6) NULL,
[TRIBUTA] [nchar](3) NULL,
[NTRIBUTA] [nchar](3) NULL,
[id] [int] IDENTITY(1,1) NOT NULL,
[migrated_on] [datetime] NOT NULL
) ON [PRIMARY]
PRESTAGING.dbo.MSCHB_TBL:
CREATE TABLE [dbo].[MSCHB_TBL](
[id] [int] IDENTITY(1,1) NOT NULL,
[COMM] [nchar](10) NOT NULL,
[DESC_COMM] [nchar](50) NULL,
[UNIDAD] [nchar](3) NULL,
[LAST_UPDATE] [date] NULL,
[LAST_UPDATED_BY] [nchar](20) NULL,
[CREATION_DATE] [date] NULL,
[CREATED_BY] [nchar](15) NULL,
[migrated_on] [datetime] NOT NULL,
PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
PRESTAGING.dbo.MTSUSA_TBL
CREATE TABLE [dbo].[MTSUSA_TBL](
[COMM] [nchar](10) NOT NULL,
[DESC_COMM] [nchar](50) NULL,
[UNIDAD] [nchar](3) NULL,
[id] [int] IDENTITY(1,1) NOT NULL,
[migrated_on] [datetime] NOT NULL
) ON [PRIMARY]
Let me know if there's anything else I need to provide.
With all those left outer joins, the query optimizer has to start with TRANSACTION_FROM_SOCRATA, so I would start with that. The only filtering is the NOT IN clause--would that cut down the 1MM rows to something more reasonable? If not, you're pretty much doomed to running at least one table scan (and possibly several) on the entire table.
If filtering on Commodity_Code would significantly cut things down, that can only be done if the column is indexed, so that SQL can find and read only those rows. It can only do that if there is an index on column--otherwise you're back to a table scan. Similarly, having an index on commodity_code in table commodities_copy` would help as well, if that table is large.
As discussed in the comments, a NOT EXISTS check would be most efficient, written as a correlated subquery:
WHERE NOT EXISTS (select commodity_code
from STAGING.dbo.commodities_copy
where commodity_code = socrtata.Commodity_Code)
(I'd want to do a lot of testing on this, checking and double-checking everything. Improving performance is tricky, doubly so when done through SO.)
Try this,
create table #socrata(Commodity_Code varchar(100),unit_2_b varchar(50),unit_2_tsusa varchar(50),[year] varchar(50))
insert into #socrata
SELECT
Commodity_Code,
MAX(socrata.unit_2) as unit_2_b,
MAX(socrata.unit_2) as unit_2_tsusa,
MAX(socrata.[year]),
FROM PRE_STAGING.dbo.TRANSACTIONS_FROM_SOCRATA socrata
group by Commodity_Code
SELECT
--Distinct
Commodity_Code,
iif(miob2.DESC_COMM is null, UPPER(socrata.Commodity_Short_Name), miob2.DESC_COMM) as short_commmodity_description_b,
iif(mio2tsusa.DESC_COMM is null, UPPER(socrata.Commodity_Short_Name), mio2tsusa.DESC_COMM) as short_commmodity_description_tsusa,
socrata.Commodity_description as long_commodity_description_b,
socrata.Commodity_description as long_commodity_description_tsusa,
iif(miob2.UNIDAD is null, socrata.unit_1, miob2.UNIDAD) as unit_1_b,
iif(mio2tsusa.UNIDAD is null, socrata.unit_1, mio2tsusa.UNIDAD) as unit_1_tsusa,
unit_2_b,
unit_2_tsusa,
socrata.end_use_e as end_use_b,
socrata.end_use_i as end_use_tsusa,
[year],
'system' as created_by,
getdate() as created_on,
miob.TRIBUTA as taxable_b,
miotsusa.TRIBUTA as taxable_tsusa,
miob.NTRIBUTA as non_taxable_b,
miotsusa.NTRIBUTA as non_taxable_tsusa,
sicb.id as sic_id_b,
sictsusa.id as sic_id_tsusa,
SUBSTRING(Commodity_Code, 1, 2) as chapter,
SUBSTRING(Commodity_Code, 1, 4) as header,
SUBSTRING(Commodity_Code, 1, 6) as sub_header,
0 as needs_validation
FROM #socrata socrata
Left join PRE_STAGING.DBO.MIOB_TBL miob ON miob.COMM=socrata.Commodity_Code
Left join PRE_STAGING.dbo.MSCHB_TBL miob2 ON miob2.COMM=socrata.Commodity_Code
Left join PRE_STAGING.dbo.MIOTSUSA_TBL miotsusa ON miotsusa.COMM=socrata.Commodity_Code
Left join PRE_STAGING.dbo.MTSUSA_TBL mio2tsusa ON mio2tsusa.COMM=socrata.Commodity_Code
Left join STAGING.dbo.sics_altered sicb ON sicb.sic_code = miob.SIC
Left join STAGING.dbo.sics_altered sictsusa ON sictsusa.sic_code = miotsusa.SIC
WHERE NOT EXISTS
(Select commodity_code from STAGING.dbo.commodities_copy where commodity_code = socrtata.Commodity_Code)
if Read uncommitted data is not a concern then you can use with (nolock)
Also your exists clause was wrong and no need of distinct.check rest of the changes.
I am using SQL Server version 2012. I have a table which has more than 10 million rows. I have to count records using a SQL filter.
My query is this:
select count(*)
from reconcil
where tenantid = 101
which is taking more than 5 minutes for 5 millions records.
Is there any fastest way to count records?
Reconcil table structure is
CREATE TABLE [dbo].[RECONCIL]
(
[AckCode] [nvarchar](50) NULL,
[AckExpireTime] [int] NULL,
[AckFileName] [nvarchar](255) NULL,
[AckKey] [int] NULL,
[AckState] [int] NULL,
[AppMsgKey] [nvarchar](30) NULL,
[CurWrkActID] [nvarchar](50) NULL,
[Date_Time] [datetime] NULL,
[Direction] [nvarchar](1) NULL,
[ErrorCode] [nvarchar](50) NULL,
[FGLOGKEY] [int] NOT NULL,
[FolderID] [int] NULL,
[FuncGCtrlNo] [nvarchar](14) NULL,
[INLOGKEY] [int] NULL,
[InputFileName] [nvarchar](255) NULL,
[IntCtrlNo] [nvarchar](14) NULL,
[IsAssoDataPresent] [nvarchar](1) NULL,
[JobState] [int] NULL,
[LOGDATA] [nvarchar](max) NULL,
[MessageID] [nvarchar](25) NULL,
[MessageState] [int] NULL,
[MessageType] [int] NULL,
[NextWrkActID] [nvarchar](50) NULL,
[NextWrkHint] [nvarchar](20) NULL,
[NONFAERRORLOG] [nvarchar](max) NULL,
[NumberOfBytes] [int] NULL,
[NumberOfSegments] [int] NULL,
[OutputFileName] [nvarchar](255) NULL,
[Priority] [nvarchar](1) NULL,
[ReceiverID] [nvarchar](30) NULL,
[RecNo] [int] NULL,
[RecordID] [int] IDENTITY(1,1) NOT NULL,
[RelationKey] [int] NULL,
[SEGLOG] [nvarchar](max) NULL,
[SenderID] [nvarchar](30) NULL,
[ServerID] [nvarchar](255) NULL,
[Standard] [int] NULL,
[TenantID] [int] NULL,
[TPAgreementKey] [int] NULL,
[TSetCtrlNo] [nvarchar](35) NULL,
[UserKey1] [nvarchar](255) NULL,
[UserKey2] [nvarchar](255) NULL,
[UserKey3] [nvarchar](255) NULL,
CONSTRAINT [RECONCIL_PK]
PRIMARY KEY CLUSTERED ([RecordID] ASC)
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
Unless you materialized the count, this non-clustered index on TenentID will provide better performance because it is narrower than the clustered primary key index and will scan only the matching rows:
CREATE INDEX idx ON [dbo].[RECONCIL](TenantID);
If performance of the aggregate query with this index isn't acceptable, you could create an indexed view with the count. The indexed view will provide the fastest performance for this query but will incur additional costs for storage and index maintenance for inserts and deletes. Also, queries that modify the table must have required SET options for indexed views. Those costs may be justified if the count query is executed often.
SQL Server can use the indexed view automatically in Enterprise (or Developer) editions even if not directly referenced in the query as long as the optimizer can match the semantics of the query using the view. In lesser editions, you'll need to query the indexed view directly and specify the NOEXPAND hint.
CREATE VIEW dbo.VW_RECONCIL_COUNT
WITH SCHEMABINDING
AS
SELECT
TenantID
, COUNT_BIG(*) AS TenentRowCount
FROM [dbo].[RECONCIL]
GROUP BY TenantID;
GO
CREATE UNIQUE CLUSTERED INDEX cdx ON dbo.VW_RECONCIL_COUNT(TenantID);
GO
--Enterprise Edition can use the view index automatically
SELECT COUNT_BIG(*) AS TenentRowCount
FROM [dbo].[RECONCIL]
WHERE TenantID = 101
GROUP BY TenantID;
GO
--other editions require the view to be specified plus the NOEXPAND hint
SELECT TenentRowCount
FROM dbo.VW_RECONCIL_COUNT WITH (NOEXPAND)
WHERE TenantID = 101;
GO
As being suggested, create an index or even partition your table by tenantId if you have so many items. This way you would have one data file per partition which increases performance.
select count(tenantid)
from reconcil
where tenantid = 101 group by tenantid ;
not sure but try using this.
This is maddening! Code in question has been running for over 5 years.
Here's the scoop....
I am doing an INSERT...SELECT into a table with a primary key that is an identity column. I do not specify the key when I insert - SQL Server generates it as expected.
I am doing the insert in a stored procedure that I call in a loop (for loop in SSIS, actually). The stored procedure will insert rows in batches (configurable). It might insert 1000 rows at a time or it might insert 50,000 - doesn't matter. It will work for a random number of calls (inserting thousands of rows) and then it will fail, out of the blue, with a
Violation of primary key / duplicate
error. If I check the identity seed - it is correct. If I kick off the process again it will work fine, for a while.
The values being inserted are coming from 2 tables that I join together, as if that matters.
The bulk of my code is below:
WHILE #pk <= #max_pk
BEGIN
INSERT INTO tbl_claim_line (fk_batch_control_group, fk_claim, fk_provider, service_from_date, service_to_date, allowed, net_paid, COB, flex_1, flex_2, flex_3, flex_4)
SELECT
#fk_batch_control_group
, c.pk_claim
, p.pk_provider
, i.date_of_service_from
, i.date_of_service_to
, i.allowed_amount
, i.net_paid_amount
, i.cob_amount
, i.claimline_flex_1
, i.claimline_flex_2
, i.claimline_flex_3
, i.claimline_flex_4
FROM
tbl_import i
INNER JOIN
tbl_import__claim c ON i.claim_number = c.claim_number
LEFT JOIN
tbl_import__provider p ON ISNULL(i.provider_type,'') = ISNULL(p.provider_type,'')
AND ISNULL(i.provider_specialty,'') = ISNULL(p.provider_specialty,'')
AND ISNULL(i.provider_zip_code,'') = ISNULL(p.provider_zip_code,'')
WHERE
pk_import = #pk
UPDATE tbl_import
SET fk_claim_line = SCOPE_IDENTITY()
WHERE pk_import = #pk
SET #pk += 1
END
--TABLE DEFINITIONS...
CREATE TABLE [dbo].[tbl_claim_line](
[fk_batch_control_group] [int] NOT NULL,
[fk_claim] [int] NOT NULL,
[fk_provider] [int] NULL,
[service_from_date] [date] NULL,
[service_to_date] [date] NULL,
[allowed] [money] NULL,
[net_paid] [money] NULL,
[COB] [money] NULL,
[flex_1] [varchar](200) NULL,
[flex_2] [varchar](200) NULL,
[flex_3] [varchar](200) NULL,
[flex_4] [varchar](200) NULL,
[pk_claim_line] [int] IDENTITY(1,1) NOT NULL,
[insert_date] [datetime] NOT NULL,
CONSTRAINT [PK_tbl_claim_line] PRIMARY KEY NONCLUSTERED
(
[pk_claim_line] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[tbl_claim_line] WITH CHECK
ADD CONSTRAINT [FK_tbl_claim_line_tbl_batch_control_group]
FOREIGN KEY([fk_batch_control_group])
REFERENCES [dbo].[tbl_batch_control_group] ([pk_batch_control_group])
GO
ALTER TABLE [dbo].[tbl_claim_line] CHECK CONSTRAINT [FK_tbl_claim_line_tbl_batch_control_group]
GO
ALTER TABLE [dbo].[tbl_claim_line] WITH CHECK
ADD CONSTRAINT [FK_tbl_claim_line_tbl_claim]
FOREIGN KEY([fk_claim])
REFERENCES [dbo].[tbl_claim] ([pk_claim])
ON DELETE CASCADE
GO
ALTER TABLE [dbo].[tbl_claim_line] CHECK CONSTRAINT [FK_tbl_claim_line_tbl_claim]
GO
ALTER TABLE [dbo].[tbl_claim_line] WITH CHECK
ADD CONSTRAINT [FK_tbl_claim_line_tbl_provider]
FOREIGN KEY([fk_provider])
REFERENCES [dbo].[tbl_provider] ([pk_provider])
GO
ALTER TABLE [dbo].[tbl_claim_line] CHECK CONSTRAINT [FK_tbl_claim_line_tbl_provider]
GO
ALTER TABLE [dbo].[tbl_claim_line] ADD CONSTRAINT [DF_tbl_claim_line__insert_date] DEFAULT (getdate()) FOR [insert_date]
GO
----second table
CREATE TABLE [dbo].[tbl_import](
[fk_claim_line] [int] NULL,
[member_id] [varchar](50) NULL,
[member_card_id] [varchar](50) NULL,
[member_first_name] [varchar](50) NULL,
[member_last_name] [varchar](50) NULL,
[member_dob] [varchar](50) NULL,
[member_gender] [varchar](50) NULL,
[member_subscriber_relationship_code] [varchar](50) NULL,
[member_address_line_1] [varchar](100) NULL,
[member_address_line_2] [varchar](100) NULL,
[member_city] [varchar](50) NULL,
[member_state] [varchar](50) NULL,
[member_zip] [varchar](50) NULL,
[member_phone] [varchar](50) NULL,
[member_email] [varchar](50) NULL,
[subscriber_id] [varchar](50) NULL,
[group_line_of_business] [varchar](50) NULL,
[group_product] [varchar](50) NULL,
[group_employer] [varchar](50) NULL,
[provider_first_name] [varchar](50) NULL,
[provider_last_or_full_name] [varchar](200) NULL,
[provider_type] [varchar](200) NULL,
[provider_specialty] [varchar](400) NULL,
[provider_zip_code] [varchar](50) NULL,
[provider_tax_id] [varchar](50) NULL,
[medical_code_1] [varchar](10) NULL,
[medical_code_1_description] [varchar](500) NULL,
[medical_code_2] [varchar](10) NULL,
[medical_code_2_description] [varchar](500) NULL,
[medical_code_3] [varchar](10) NULL,
[medical_code_3_description] [varchar](500) NULL,
[medical_code_4] [varchar](10) NULL,
[medical_code_4_description] [varchar](500) NULL,
[medical_code_5] [varchar](10) NULL,
[medical_code_5_description] [varchar](500) NULL,
[medical_code_6] [varchar](10) NULL,
[medical_code_6_description] [varchar](500) NULL,
[medical_code_7] [varchar](10) NULL,
[medical_code_7_description] [varchar](500) NULL,
[medical_code_8] [varchar](10) NULL,
[medical_code_8_description] [varchar](500) NULL,
[medical_code_9] [varchar](10) NULL,
[medical_code_9_description] [varchar](500) NULL,
[medical_code_10] [varchar](10) NULL,
[medical_code_10_description] [varchar](500) NULL,
[medical_code_11] [varchar](10) NULL,
[medical_code_11_description] [varchar](500) NULL,
[medical_code_12] [varchar](10) NULL,
[medical_code_12_description] [varchar](500) NULL,
[medical_code_13] [varchar](10) NULL,
[medical_code_13_description] [varchar](500) NULL,
[medical_code_14] [varchar](10) NULL,
[medical_code_14_description] [varchar](500) NULL,
[medical_code_15] [varchar](10) NULL,
[medical_code_15_description] [varchar](500) NULL,
[medical_code_16] [varchar](10) NULL,
[medical_code_16_description] [varchar](500) NULL,
[date_of_service_from] [varchar](50) NULL,
[date_of_service_to] [varchar](50) NULL,
[claim_number] [varchar](50) NULL,
[claim_line_number] [varchar](50) NULL,
[original_claim_number] [varchar](50) NULL,
[allowed_amount] [varchar](50) NULL,
[net_paid_amount] [varchar](50) NULL,
[cob_amount] [varchar](50) NULL,
[date_paid] [varchar](50) NULL,
[member_flex_1] [varchar](200) NULL,
[member_flex_2] [varchar](200) NULL,
[member_flex_3] [varchar](200) NULL,
[member_flex_4] [varchar](200) NULL,
[claim_flex_1] [varchar](200) NULL,
[claim_flex_2] [varchar](200) NULL,
[claim_flex_3] [varchar](200) NULL,
[claim_flex_4] [varchar](200) NULL,
[claimline_flex_1] [varchar](200) NULL,
[claimline_flex_2] [varchar](200) NULL,
[claimline_flex_3] [varchar](200) NULL,
[claimline_flex_4] [varchar](200) NULL,
[pk_import] [int] IDENTITY(1,1) NOT NULL,
CONSTRAINT [PK_tbl_import] PRIMARY KEY NONCLUSTERED
(
[pk_import] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
I ran into this and much like user3170349, it was a seed issue on the column. I'm adding some additional info, however.
First, you can run this to figure out if you have a seed problem:
DBCC CHECKIDENT ('TABLE_NAME_GOES_HERE', NORESEED);
This will give you information which will read something like this:
Checking identity information: current identity value 'XXXX', current column value 'YYYY'.
If YYYY is larger than XXXX, then you have a problem and need to RESEED the table to get things going again. You can do so with the following command:
DBCC CHECKIDENT ('TABLE_NAME_GOES_HERE', RESEED, ZZZZZ);
Where ZZZZ is the reseed value. That value should be at least one higher than YYYY. YMMV, so pick a value that is appropriate for your situation.
"Code in question has been running for over 5 years."
"It might insert 1000 records at a time or it might insert 50,000 "
Is it possible you have finally overflowed the integer type of the primary key?
Did it wrap around and is now starting over? That would cause duplicate primary keys.
I have an engineering practice of SQL Optimization problem, which I think is a typical case ,and will help a lot of guys.
SQL SERVER 2005,
Firstly, create the main table. This is a person info table.
CREATE TABLE [dbo].[OLAPAgentDim](
[RoleID] [varchar](50) NULL CONSTRAINT [DF_OLAPAgentDim_RoleID] DEFAULT ((1)),
[OLAPKey] [bigint] IDENTITY(1,1) NOT NULL,
[FatherKey] [bigint] NULL,
[FatherKeyValue] [nvarchar](100) NULL,
[System] [varchar](6) NULL,
[Level] [int] NULL,
[IfLeaf] [real] NULL,
[IfDel] [real] NULL CONSTRAINT [DF_OLAPAgentDim_IfDel] DEFAULT ((0)),
[SourceKey] [varchar](50) NULL,
[MainDemoName] [nvarchar](100) NULL,
[FastCode] [varchar](50) NULL,
[TagValue] [varchar](50) NULL,
[Script] [nvarchar](max) NULL,
[Birthday] [datetime] NULL,
[EarlyStartTime] [datetime] NULL,
[StartTime] [datetime] NULL,
[EndTime] [datetime] NULL,
[EditTime] [datetime] NULL,
[BecomesTime] [datetime] NULL,
[ContractTime] [datetime] NULL,
[ContractEndTime] [datetime] NULL,
[XMLIcon] [nvarchar](max) NULL,
[PassKey] [varchar](50) NULL CONSTRAINT [DF_OLAPAgentDim_PassKey] DEFAULT ('N3pkY3RHaeZXA9mGJdfm8A=='),
[Address] [nvarchar](100) NULL,
[HomeTel] [varchar](50) NULL,
[Mobile] [varchar](50) NULL,
[Email] [varchar](100) NULL,
[IDCard] [varchar](50) NULL,
[IDSecu] [varchar](50) NULL,
[IDEndowment] [varchar](50) NULL,
[IDAccumulation] [varchar](50) NULL,
[ContactPerson] [nvarchar](100) NULL,
[ContactPersonTel] [varchar](50) NULL,
[Others1] [varchar](50) NULL,
[SexKey] [varchar](2) NULL CONSTRAINT [DF_OLAPAgentDim_SexKey] DEFAULT ((1)),
[SexKeyValue] [nvarchar](100) NULL,
[MarrageKey] [varchar](2) NULL CONSTRAINT [DF_OLAPAgentDim_MarrageKey] DEFAULT ((1)),
[MarrageKeyValue] [nvarchar](100) NULL,
[Nation] [nvarchar](50) NULL,
[Race] [nvarchar](50) NULL,
[PartyMemberKey] [varchar](2) NULL CONSTRAINT [DF_OLAPAgentDim_PartyMemberKey] DEFAULT ((1)),
[PartyMemberKeyValue] [nvarchar](100) NULL,
[RegionKey] [bigint] NULL CONSTRAINT [DF_OLAPAgentDim_RegionKey] DEFAULT ((1)),
[RegionKeyValue] [nvarchar](100) NULL,
[LeaveResonKey] [bigint] NULL CONSTRAINT [DF_OLAPAgentDim_LeaveResonKey] DEFAULT ((1)),
[LeaveResonKeyValue] [nvarchar](100) NULL,
[RoleStr] [varchar](max) NULL,
[RoleStrValue] [nvarchar](max) NULL,
[LeaderKey] [bigint] NULL CONSTRAINT [DF_OLAPAgentDim_LeaderKey] DEFAULT ((1)),
[LeaderKeyValue] [nvarchar](100) NULL,
[FastCode2] [varchar](50) NULL,
[FastCode3] [varchar](50) NULL,
[FastCode4] [varchar](50) NULL,
[FastCode5] [varchar](50) NULL,
[OtherAddress] [nvarchar](100) NULL,
[ShowOrder] [int] NULL,
[RaceKey] [bigint] NULL DEFAULT ((1)),
[RaceKeyValue] [nvarchar](100) NULL,
[DepartLevelKey] [bigint] NULL DEFAULT ((1)),
[DepartLevelKeyValue] [nvarchar](100) NULL,
[forumname] [nvarchar](100) NULL,
[IfCloseKey] [bigint] NULL DEFAULT ((1)),
[IfCloseKeyValue] [nvarchar](100) NULL,
[InsureStartTime] [datetime] NULL,
[AccumulationStartTime] [datetime] NULL,
[Rate] [varchar](50) NULL,
[DirectLeaderKey] [bigint] NULL CONSTRAINT [DF_OLAPAgentDim_DirectLeaderKey] DEFAULT ((1)),
[DirectLeaderAttriKey] [bigint] NULL CONSTRAINT [DF_OLAPAgentDim_DirectLeaderAttriKey] DEFAULT ((1)),
[DirectLeaderKeyValue] [nvarchar](100) NULL,
[DirectLeaderSourceKey] [varchar](50) NULL,
[DirectLeaderPartName] [nvarchar](100) NULL,
[DirectLeaderPositionName] [nvarchar](100) NULL,
[NOTSync] [int] NULL,
[FatherPath] [nvarchar](max) NULL,
[SaleDiscount] [real] NULL,
CONSTRAINT [PK_OLAPAgent Dim] PRIMARY KEY CLUSTERED
(
[OLAPKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Secondly, insert abount 10,000 record into the table. I think 10,000 record is not a very big number to SQL SERVER. You can see this is a father and children dimention table in fact. The records with ifleaf=0 means the person's department structure node, the records with ifleaf=1 means the person. You can define father-children relationship using FahterKey column. For Example:
OLAPKey IfLeaf FatherKey DepartLevelKey MainDemoName
2 0 0 1 IBM Company
3 0 2 2 Sales Depart
4 0 2 2 Service Depart
5 0 3 3 Sales Team1
6 1 5 NULL John Smith
7 1 4 NULL Mary
......
DepartLevelKey Column means the depart node's level.
So in this table, we can save the whole HR tree info.
Thirdly, we see the problem SQL:
create table #t
(
TableID int IDENTITY(1,1),
OLAPKey bigint,
MainDemoName nvarchar(max)
)
declare #t4 table
(
TableID int IDENTITY(1,1),
MainDemoName nvarchar(max),
OLAPKeystr varchar(100)
)
declare #agentkey bigint
set #agentkey ='2'
--Part A
--DepartLevelKey=2, to get #agentkey node's all level=2 department
;WITH Result AS(
SELECT OLAPKey,DepartLevelKey,maindemoname FROM OLAPAgentDim WHERE OLAPKey =#agentkey
UNION ALL
SELECT a.OLAPKey,a.DepartLevelKey,a.maindemoname FROM OLAPAgentDim AS a,Result AS b WHERE a.FatherKey = b.OLAPKey
)
insert #t select OLAPKey,maindemoname from Result where DepartLevelKey=4
--Part B
;with One as
(
select *,convert(varchar(50),OLAPKey) as Re from #t
)
insert #t4 select maindemoname,stuff((select ','+Re from One where One.maindemoname=#t.maindemoname for xml path('')),1,1,'') as Two
from #t
group by maindemoname
drop table #t
The SQL above is divided into Part A and Part B.
Part A SQL get all the childrens below a root node(and filtered those belong to the specified DepartLevelKey). For example, to get all persons in Sales Department's child-department with level=3.
Part B SQL change the rows to column, For example:
Change:
TableID OLAPKey MainDemoName
1 6 Sales Team1
2 10 Sales Team1
3 12 Sales Team1
to:
TableID MainDemoName OLAPKeystr
1 Sales Team1 6,10,12
Thus we get each goal department's persons, for further processing(omited here).
The Problem:
The Part A is very slow, cost about 5 minutes. The Part B is slow too.
I wonder how to optimize it basing the table struc existed.
yours,
Ivan
Try:
(i) Adding this index to OLAPAgentDim:
create index IX_OLAPAgentDim_FatherKey on OLAPAgentDim (FatherKey) include (DepartLevelKey, MainDemoName)
(ii) Changing MainDemoName in #t from nvarchar(max) to nvarchar(100). This matches the column definition in OLAPAgentDim.
(iii) Between Part A and Part B, i.e. after Part A and before Part B, adding this index to #t:
create clustered index IX on #t (MainDemoName)
I have a database that can have data updated from two external parties.
Each of those parties sends a pipe delimited text file that is BULK INSERTED into the staging table.
I now want to change the scheme for one of the parties by adding a few columns, but this is unfortunately breaking the BULK INSERT for the other party even though the new columns are all added as NULLABLE.
Is there any obvious solution to this?
TABLE SCHEMA:
CREATE TABLE [dbo].[CUSTOMER_ENTRY_LOAD](
[CARD_NUMBER] [varchar](12) NULL,
[TITLE] [varchar](6) NULL,
[LAST_NAME] [varchar](34) NULL,
[FIRST_NAME] [varchar](40) NULL,
[MIDDLE_NAME] [varchar](40) NULL,
[NAME_ON_CARD] [varchar](26) NULL,
[H_ADDRESS_PREFIX] [varchar](50) NULL,
[H_FLAT_NUMBER] [varchar](5) NULL,
[H_STREET_NUMBER] [varchar](10) NULL,
[H_STREET_NUMBER_SUFFIX] [varchar](5) NULL,
[H_STREET] [varchar](50) NULL,
[H_SUBURB] [varchar](50) NULL,
[H_CITY] [varchar](50) NULL,
[H_POSTCODE] [varchar](4) NULL,
[P_ADDRESS_PREFIX] [varchar](50) NULL,
[P_FLAT_NUMBER] [varchar](5) NULL,
[P_STREET_NUMBER] [varchar](10) NULL,
[P_STREET_NUMBER_SUFFIX] [varchar](5) NULL,
[P_STREET] [varchar](50) NULL,
[P_SUBURB] [varchar](50) NULL,
[P_CITY] [varchar](50) NULL,
[P_POSTCODE] [varchar](4) NULL,
[H_STD] [varchar](3) NULL,
[H_PHONE] [varchar](7) NULL,
[C_STD] [varchar](3) NULL,
[C_PHONE] [varchar](10) NULL,
[W_STD] [varchar](3) NULL,
[W_PHONE] [varchar](7) NULL,
[W_EXTN] [varchar](5) NULL,
[DOB] [smalldatetime] NULL,
[EMAIL] [varchar](50) NULL,
[DNS_STATUS] [bit] NULL,
[DNS_EMAIL] [bit] NULL,
[CREDITCARD] [char](1) NULL,
[PRIMVISACUSTID] [int] NULL,
[PREFERREDNAME] [varchar](100) NULL,
[STAFF_NUMBER] [varchar](50) NULL,
[CUSTOMER_ID] [int] NULL,
[IS_ADDRESS_VALIDATED] [varchar](50) NULL
) ON [PRIMARY]
BULK INSERT STATEMENT:
SET #string_temp = 'BULK INSERT customer_entry_load FROM '+char(39)+#inpath
+#current_file+'.txt'+char(39)+' WITH (FIELDTERMINATOR = '+char(39)+'|'+char(39)
+', MAXERRORS=1000, ROWTERMINATOR = '+char(39)+'\n'+char(39)+')'
SET DATEFORMAT dmy
EXEC(#string_temp)
The documentation describes how to use a format file to handle the scenario where the target table has more columns than the source file. An alternative that can sometimes be easier is to create a view on the table and BULK INSERT into the view instead of the table; this possibility is described in the same documentation.
And please always mention your SQL Server version.
Using OPENROWSET with BULK allows you to use your file in a query. You can use that to format the data and select only the columns you need.
In the end I have handled the two different cases with two different BULK INSERT statements (depending on which file is being processed). It seems like there isn't a way to do what I was trying to do with one statement.
You could use the format file idea supplied by #Pondlife.
Adapt your insert dynamically based on the input file name (provided there are unique differneces between the external parties). Using a CASE statement, simply select the correct format file based on the unique identifier in the file name.
DECLARE #formatFile varchar (max);
Set #formatFile =
CASE
WHEN #current_file LIKE '%uniqueIdentifier%'
THEN 'file1'
ELSE 'file2'
END
SET #string_temp = 'BULK INSERT customer_entry_load FROM '+char(39)+#inpath
+#current_file+'.txt'+char(39)+' WITH (FORMATFILE = '+char(39)+#formatFile+char(39)
')'
SET DATEFORMAT dmy
EXEC(#string_temp)
Hope that helps!