Which approach is better for this scenario?

Which approach is better for this scenario? - sql-server

We have the following table:
CREATE TABLE [dbo].[CampaignCustomer](
[ID] [int] IDENTITY(1,1) NOT NULL,
[CampaignID] [int] NOT NULL,
[CustomerID] [int] NULL,
[CouponCode] [nvarchar](20) NOT NULL,
[CreatedDate] [datetime] NOT NULL,
[ModifiedDate] [datetime] NULL,
[Active] [bit] NOT NULL,
CONSTRAINT [PK_CampaignCustomer] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
and the following Unique Index:
CREATE UNIQUE NONCLUSTERED INDEX [IX_CampaignCustomer_CouponCode] ON [dbo].[CampaignCustomer]
(
[CouponCode] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 20) ON [PRIMARY]
GO
We do pretty constant queries using the CouponCode and other foreign keys (not shown above for simplicity). The CampaignCustomer table has almost 4 million records and growing. We also do campaigns that don't require Coupon Codes and therefore we don't insert those records. Now we need to also start tracking those campaigns as well for another purpose. So we have 2 options:
We change the CouponCode column ot allow nulls and create a unique filetered index to not include nulls and allow the table to grow even bigger and faster.
Create a separate table for tracking all campaigns for this specific purpose.
Keep in mind that the CampaignCustomer table is used very often for redeeming coupons and inserting new ones. Bottom line is we don't want our customer to redeem a coupon and stay waiting until they give up or for other processes to fail. So, from an efficiency perspective, which option do you think is best and why?

I'd go for the filtered index... you're storing the same data so keep it in the same table.
Splitting the table is refactoring when you probably don't need it and adds complexity.
Do you have problems with 4 million rows? It's not that much especially for such a narrow table

I'm against a duplicate table for the sake of a single column
Allowing the couponcode to be null means that someone could accidentally create a record where the value is NULL when it should be a valid couponcode
I would create a couponcode that indicates as being a non-coupon rather than resorting to indicator columns "isCoupon" or "isNonCouponCampaign", and use a filtered index to ignore the "nocoupon" value.
Which leads to my next point - I don't see a foreign key reference, but it would be key to knowing what coupons existed and which ones were actually used. Some of the columns in the existing table could be moved up to the parent couponcode table...

Related

How do I insert a new record to a table containing just an IDENTITY column?

I have a single-column table where the column is a primary key and clustered index. It is used on other tables to relate records together. It doesn't seem an Insert statement is the way to go, there's no other columns to populate. It's a bit cumbersome to SET IDENTITY_INSERT off and on, etc.
I just need to "increment" the primary key of the table to the next integer value.
I believe it's an easy problem to solve, but I'm at that stage of mental exhaustion where the wheel is still spinning but the hamster is dead.
Here is a script to recreate the table I'm working with.
CREATE TABLE [dbo].[PKOnly]
(
[Id] [BIGINT] IDENTITY(1,1) NOT NULL,
CONSTRAINT [PK_PKOnly]
PRIMARY KEY CLUSTERED ([Id] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY];

You can use DEFAULT VALUES:
INSERT dbo.PKOnly DEFAULT VALUES;
Example db<>fiddle
Note this will also work if you have other columns with defaults.

How can I add values of a column but only where the artist is the same, in SQL Server?

I have 2 tables, first table:
CREATE TABLE [dbo].[songs]
(
[ID_Song] [INT] IDENTITY(1,1) NOT NULL,
[SongTitle] [NVARCHAR](100) NOT NULL,
[ListenedCount] [INT] NOT NULL,
[Artist] [INT] NOT NULL,
CONSTRAINT [PK_songs]
PRIMARY KEY CLUSTERED ([ID_Song] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[songs] WITH CHECK
ADD CONSTRAINT [FK_songs_artists]
FOREIGN KEY([Artist]) REFERENCES [dbo].[artists] ([ID_Artist])
GO
And second table:
CREATE TABLE [dbo].[artists]
(
[ID_Artist] [INT] IDENTITY(1,1) NOT NULL,
[Name] [NVARCHAR](100) NOT NULL,
CONSTRAINT [PK_artists]
PRIMARY KEY CLUSTERED ([ID_Artist] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
As you can see column Artist in table Songs references column ID_Artist of table Artists.
I want to get all Artists by summing up ListenedCount of all their songs where it's value is greater than a value.
I have trouble writing the query.

There are many ways to achieve it.
One is by summing in a subquery and using it, the sum, as a filter in the query.
select art.[Name], gba.[ListenedSum]
from [dbo].[artists] art
join
(
select sg.[Artist], sum(sg.[ListenedCount]) as [ListenedSum]
from [dbo].[songs] sg
group by sg.[Artist]
) as gba on gba.[Artist] = art.[ID_Artist]
where gba.[ListenedSum] > 1000000
A more direct way can be using HAVING
select art.[Name], sum(sg.[ListenedCount]) as [ListenedSum]
from [dbo].[artists] art
join [dbo].[songs] sg on sg.[Artist] = art.[ID_Artist]
group by art.[Name]
having sum(sg.[ListenedCount]) > 1000000
It's interesting to note the engine can end running these two queries in different ways (not guaranteed) and they can end with different performances.
There's another interesting way, like using a CTE but I think it's a bit more complicated.

Updates on a specific table in a database via Informatica takes a long time at a specific time interval

I am an informatica developer.
I have a dimension table in a Shared application database(SQL Server).
I have 4 keys on the table, and have a unique nonclustered index on this table.
At informatica target, we have the same 4 columns set as key columns.
We do a lookup on this dimension table on the same 4 keys to flag whether its an insert or an update.
I have a schedule in informatica where this job runs 6 times in a day at different intervals. For all the morning loads, the job run so quick, the throughtput of updates is like 1000 rec/sec.
But only for the evening load, for this particular table, the throughtput reduces like 12-15 rec/sec, this has started since last one month.
We expected that, there could be something else locking this table, or something at the database end. So contacted the DBA's they enabled a trace on this particular table, But they were not able to identity anything else. If anyone could assist me in any ways, or any hints where i might see, it would really be great.
The informatica server is also a shared server. But at this point of time where there is performance issue, both the SQL Server and the Informatica server are lightly loaded. If i have missed anything, kindly let me know and i can try to put additional information. The table has 94 columns, and The definition of the table goes as below, with A SurrogateKey, B,C,D and E being the keys :
CREATE TABLE TEMP(
A [int] IDENTITY(1,1) NOT NULL,
B [varchar](8) NOT NULL,
C [varchar](11) NOT NULL,
D [varchar](3) NOT NULL,
E [varchar](2) NOT NULL,
F [varchar](1) NULL,
.
.
.
CP [char](1) NULL,
CONSTRAINT [PK_T_TEMP] PRIMARY KEY CLUSTERED
(
[A] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY],
CONSTRAINT [IX_T_TEMP] UNIQUE NONCLUSTERED
(
[A] ASC,
[B] ASC,
[C] ASC,
[D] ASC,
[E] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY];
All the columns are getting updated based on the 4 keys. So the update statement looks like below.
UPDATE TEMP
SET
F=?,
G=?,
H=?,
..
..
..
CP=?
WHERE B=? AND C=? AND D=? AND E=?;

SQL performance advice

I know that it can be a tricky question and that it highly depends on context.
I have a database with a table containing around 10 millions lines (each line contains 4 varchar).
There is an index non clustered on the field (a varchar) used for the where.
I'm a bit confused because when selecting a line using a where on the indexed columns, it takes around a second to end.
Any advices to improve this response time ?
Would an indexed clustered be a good solution here?
Here is the table definition :
CREATE TABLE [dbo].[MYTABLE](
[ID] [uniqueidentifier] NOT NULL DEFAULT (newid()),
[CreationDate] [datetime] NOT NULL DEFAULT (getdate()),
[UpdateDate] [datetime] NULL,
[FIELD1] [varchar](9) NOT NULL,
[FIELD2] [varchar](max) NOT NULL,
[FIELD3] [varchar](1) NULL,
[FIELD4] [varchar](4) NOT NULL,
CONSTRAINT [PK_MYTABLE] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
Here is the index definition :
CREATE UNIQUE NONCLUSTERED INDEX [IX_FIELD1] ON [dbo].[MYTABLE]
(
[FIELD1] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
And here is the query I use (very basic) :
SELECT * FROM MYTABLE WHERE FIELD1 = 'DATA'

For this case, since you are selecting all columns (*), changing your nonclustered index to a clustered one will improve the time, since accessing additional columns (columns not included in the index, not ordered nor by INCLUDE) from a nonclustered index will need to retrieve another page with the actual data.
Since you can't have more than 1 clustered index by table, you will have to drop the existing one (the PRIMARY KEY in this case) and create it afterwards.
ALTER TABLE dbo.MYTABLE DROP [PK_MYTABLE] -- This might fail if you have foreign keys
ALTER TABLE dbo.MYTABLE ADD CONSTRAINT PK_MYTABLE PRIMARY KEY NONCLUSTERED (ID)
DROP INDEX [IX_FIELD1] ON [dbo].[MYTABLE]
CREATE CLUSTERED INDEX [IX_FIELD1] ON [dbo].[MYTABLE] (FIELD1)
Indexes access times might greatly vary depending on their fragmentation also (if you have many inserts, deletes or updates with values that aren't bigger or lower than the last/first one).
Also keep in mind that if you are doing another operation like joins, function calls or additional WHERE filters, the enging might decide not to use the indexes.

If you are certain that the column used in your WHERE clause is unique, you may create a UNIQUE CLUSTERED INDEX on that column.
If values in that column are not unique, you can implement COVERING INDEXES. For example, if you had this index:
CREATE NONCLUSTERED INDEX IX_Column4 ON MyTable(Column4)
INCLUDE (Column1, Column2);
When executing the following query:
SELECT Column1, Column2 FROM MyTable WHERE Column4 LIKE 'Something';
You would (most likely) be using the IX_Column4 index. But when executing something like:
SELECT Column1, Column2, Column3 FROM MyTable WHERE Column4 LIKE 'Something';
You will not be benefited from the advantages that this kind of index have to offer.
If rows in your table are regurlarly INSERTED, DELETED or UPDATED you should check for INDEX FRAGMENTATION and REBUILD or REORGANIZE them.
I would recommend the following articles in case you want to lear more about indexes:
Available index types, check out the guidelines offered for each kind of index.
SQL Server Index Design Guide
Hope it helps.

High index fragmentation

we have a SQL Server database running that for some reason encounters massive index fragmentation (up to 85% and higher) and we find ourselves rebuilding indexes almost daily to stop this.
We are at a loss however why this is happening. The tables use newsequentialid() to generate the new GUIDs (primary keys), so we think new rows should always be added at the end, but we feel like this is not the case which maybe results in high fragmentation rate?
Does anyone have any ideas, things we can try to alleviate this problem or further diagnose it?
An example table would be :
CREATE TABLE [dbo].[EmailMessages](
[Id] [uniqueidentifier] NOT NULL,
[Name] [nvarchar](max) NULL
CONSTRAINT [PK_dbo.EmailMessages] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE [dbo].[EmailMessages] ADD CONSTRAINT [DF__EmailMessage__Id__7993056A] DEFAULT (newsequentialid()) FOR [Id]
GO