I have a table which named "RawNews" with the following fields:
[NewsID] [decimal](18, 0) IDENTITY(1,1) NOT NULL,
[Title] [nvarchar](200) NULL,
[Description] [nvarchar](500) NULL,
[Text] [ntext] NULL,
[RegDate] [nvarchar](50) NULL,
[RegTime] [time](0) NULL,
[Status] [nvarchar](300) NULL,
[Tags] [nvarchar](50) NULL,
[SecurityLevelID] [smallint] NULL,
[IsDeleted] [bit] NULL,
[DelDate] [nchar](10) NULL,
[UserName] [nvarchar](50) NULL,
and another table named "UsedNews" which has the same fields plus
some other fields.
and there are some other tables realted to these tables like uploads,images,newsGroups,NewsRooms,Users and etc.
in the RawNews I have 100 records each day. and these 100 recors for UsedNews.
these information should be kept for like eternity.
I wanted to ask you to advice me for a good way too make archive of records, so the search and filtering performance remains good.
and another question: I have log table which logs every event in the system, should i keep it in a seperate database or not?
thanks a lot
I'd suggest table partitioning to solve this. The RegDate can be used to determine whether the records should be moved to the archive group or not.
You could choose to place the archive table on a separate disk so when searches on it might take place, it'll have the least possible effect on the rest of the "live" database.
There are many ways to do this..
You can use appropriate "Replication" functionality of SQL Server. In your case Snapshot Replication type seems appropriate for more details Click
You can create windows service to move data from RawNews to UsedNews.
You can create an exe to move data and using windows task scheduler you can call that exe at specific time when there is less load on server.
Related
This is a question more about design than about solving a problem.
I created three tables as such
CREATE TABLE [CapInvUser](
[UserId] [int] IDENTITY(1,1) NOT NULL,
[Name] [varchar](150) NOT NULL,
[AreaId] [int] NULL,
[Account] [varchar](150) NULL,
[mail] [varchar](150) NULL,
[UserLevelId] [int] NOT NULL
)
CREATE TABLE [CapInvUserLevel](
[UserLevelId] [int] IDENTITY(1,1) NOT NULL,
[Level] [varchar](50) NOT NULL
)
CREATE TABLE [CapInvUserRegistry](
[UserRegistryId] [int] IDENTITY(1,1) NOT NULL,
[UserLevelId] int NOT NULL,
[DateRegistry] DATE NOT NULL,
[RegistryStatus] VARCHAR(50) NOT NULL,
)
With a view that shows all the data on the first table with "AreaId" being parsed as the varchar identifier of that table, the UserLevel being parsed as the varchar value of that table, and a join of the registry status of the last one.
Right now when I want to register a new user, I insert into all three tables using separate queries, but I feel like I should have a way to insert into all of them at the same time.
I thought about using a stored procedure to insert, but I still don't know if that would be apropiate.
My question is
"Is there a more apropiate way of doing this?"
"Is there a way to create a view that will let me insert over it? (without passing the int value manually)"
--This are just representations of the tables, not the real ones.
-- I'm still learning how to work with SQL Server properly.
Thank you for your answers and/or guidance.
The most common way of doing this, in my experience, is to write a stored procedure that does all three inserts in the necessary order to create the FK relationships.
This would be my unequivocal recommendation.
I've got a client portal project (the first one I've developed so a basic best practice is what I'm looking for here, nothing fancy) nearing first release.
A simplification of the main record types used in reporting is the following:
CREATE TABLE [dbo].[conversions](
[conversion_id] [nvarchar](128) primary key NOT NULL,
[click_id] [int] NULL,
[conversion_date] [datetime] NOT NULL,
[last_updated] [datetime] NULL,
[click_date] [datetime] NULL,
[affiliate_affiliate_id] [int] NOT NULL,
[advertiser_advertiser_id] [int] NOT NULL,
[offer_offer_id] [int] NOT NULL,
[creative_creative_id] [int] NOT NULL,
[conversion_type] [nvarchar](max) NULL)
CREATE TABLE [dbo].[clicks](
[click_id] [int] primary key NOT NULL,
[click_date] [datetime] NOT NULL,
[affiliate_affiliate_id] [int] NOT NULL,
[advertiser_advertiser_id] [int] NOT NULL,
[offer_offer_id] [int] NOT NULL,
[campaign_id] [int] NOT NULL,
[creative_creative_id] [int] NOT NULL,
[ip_address] [nvarchar](max) NULL,
[user_agent] [nvarchar](max) NULL,
[referrer_url] [nvarchar](max) NULL,
[region_region_code] [nvarchar](max) NULL,
[total_clicks] [int] NOT NULL)
My specific question is: given millions of rows in each table, what mechanism is used to serve up summary reports quickly on demand given you know all the possible reports that can be requested?
The starting point, performance wise, doing raw queries against a 18 months worth of data for the busiest client is yielding a 3 to 5 second latency on my dashboard and the worst case is upwards of 10 seconds for a summary report with a custom date range spanning all the rows.
I know I can cache them after the first hit, but I want snappy performance on the first hit.
My feeling is this is a fundamental aspect of an application of this nature and that there are tons of applications like this out there, so is there an already well-thought-out method to pre-calculating tables that already did the grouping and aggregation? Then how do you keep them up to date? Do you use SQL agent and custom console apps that brute force the calculations before hand?
Any general pointers would be very appreciated..
Both tables are time series. They seem to be clustered by an ID column which has little value for how time series are queried. Time series are almost always queried by date range, so your clustered organization should service this type of queries first and foremost: cluster by date, move the ID primary key constraint into a non-clustered.
CREATE TABLE [dbo].[conversions](
[conversion_id] [nvarchar](128) NOT NULL,
[conversion_date] [datetime] NOT NULL,
...
constraint pk_conversions nonclustered primary key ([conversion_id]))
go
create clustered index [cdx_conversions] on [dbo].[conversions]([conversion_date]);
go
CREATE TABLE [dbo].[clicks](
[click_id] [int] NOT NULL,
[click_date] [datetime] NOT NULL,
...
constraint [pk_clicks] nonclustered [click_id]);
go
create clustered index [cdx_clicks] on [dbo].[clicks]([click_date]);
This model will serve the typical queries that filter by a range on [click_date] and on [conversion_date]. For any other query the answer will be very specific to your query.
There are limits on how useful a relational row organized model can be for an OLAP/DW workload like yours. Specialized tools do a better job at it. Columnstore indexes can deliver amazingly fast responses, but they are difficult to update. Creating a MOLAP cube can also deliver blazing results but that is a serious project undertaking. There are even specialized time series databases out there.
I just experienced a database breakdown due to sudden extradordinary data loading from disk.
I found the issue would arise when I attempted inserting into a log table with approx. 3.5 million rows. The table features an ID column set to IDENTITY, but with no indexes or unique constraints.
CREATE TABLE [dbo].[IntegrationTestLog](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Ident] [varchar](50) NULL,
[Date] [datetime] NOT NULL,
[Thread] [varchar](255) NOT NULL,
[Level] [varchar](50) NOT NULL,
[Logger] [varchar](255) NOT NULL,
[Message] [varchar](max) NOT NULL,
[Exception] [varchar](max) NULL
)
Issue triggered by this line:
INSERT INTO IntegrationTestLog ([Ident],[Date],[Thread],[Level],[Logger],[Message],[Exception]) VALUES (#Ident, #log_date, #thread, #log_level, #logger, #message, #exception)
There are possibly many other queries that will trigger it, but this one I know for sure.
Bear with me, cuz' Im only guessing now, but does the identity seeding process somehow slow down if an index is missing? Could it by any slight chance fall back to doing a MAX(ID) query to get the latest entry? (Probably not). I haven't succeeded in finding any deep technical information about the subject yet. Please share if you know some litterature or links to such.
To solve the issue, we ended up truncating the table, which itself took VERY long. I also promoted ID to be primary key.
Then I read this article: Identity columns and found that truncate actually does touch the identity seed.
A truncate table (but not delete) will update the current seed to the
original seed value.
...which again only led me to be more suspecious of the identity seed.
Again I'm searching in the dark - please enlighten me on this issue if you have the insight.
I'll be keeping details of PageView counts for a specific table.
Table design is:
[IMAGE_ID] [int] IDENTITY(1,1) NOT NULL,
[IMAGE_PATH] [nvarchar](150) NOT NULL,
[CARTOON_ID] [int] NOT NULL,
[ADD_DATE] [datetime] NOT NULL,
[ADD_USER_ID] [int] NOT NULL,
[IMAGE_TEXT] [nvarchar](max) NULL
I'll be showing these images on each page and need the best way to keep the unique page view counts.
How would you do it?
Please remember that this table will have around 10000 images in short time and will a lot of activity. Updating this table on each request doesn't seem clever to me.
I guess the best way is to keep a temp table with
IMAGE_ID
IP_ADDRESS
VISIT_DATE
and a view table that keeps
IMAGE_ID
COUNTER
And batch update the view table with the details of temp table and clear the content of it periodically.
I'm looking for some advice to how to get the indexes running better on this query...
SQL Server 2005/8 some customers have 5 some 8...
SELECT sales.ChainStoreId,
sales.CashBoxId,
dbo.DateOnly2(sales.BonDate),
MAX(sales.BonDate),
SUM(sales.SumPrice)
FROM [BACK_CDM_CLEAN_BOLTEN].[dbo].[CashBoxSales] sales
WHERE sales.BonType in ('B','P','W')
AND Del = 0
AND sales.BonDate >= #minDate
GROUP BY sales.ChainStoreId,
sales.CashBoxId,
dbo.DateOnly2(sales.BonDate)
Table looks like the following
CREATE TABLE [dbo].[CashBoxSales](
[SalesRowId] [int] IDENTITY(1,1) NOT NULL,
[ChainStoreId] [int] NOT NULL,
[CashBoxId] [int] NOT NULL,
[BonType] [char](1) NOT NULL,
[BonDate] [datetime] NOT NULL,
[BonNr] [nvarchar](20) NULL,
[SumPrice] [money] NOT NULL,
[Discount] [money] NOT NULL,
[EmployeeId] [int] NULL,
[DayOfValidity] [datetime] NOT NULL,
[ProcStatus] [int] NOT NULL,
[Del] [int] NOT NULL,
[InsertedDate] [datetime] NOT NULL,
[LastUpdate] [datetime] NOT NULL,
What would be the correct ordering of the index columns, covered or composite etc.
The table has up to 10 mil rows. There are other similar selects but I'm hoping from the advice getting this one up to speed (Its the most important) I can tweak a few others.
Many thanks!
When you have your query in SQL Server Management Studio, just select "Analyze Query in Database Tuning Advisor" from the context menu, and off you go!
Mind you: this only tweaks this one single query in isolation! Adding indices here to speed this one query up might adversely affect other parts of your application. An index always comes with overhead - inserts and deletes tend to be slower.
Also, don't blindly implement all the recommendations of the DTA - use your own judgment as to whether an index makes sense or not.
And lastly: measure, measure, measure! Measure your performance before any changes as a baseline, then measure again and again after you've made the changes and compare.
My best advice is to run this query through the SQL Profiler. It will recommend some indexes for you to try.
Also, you might try setting up a partitioned table and use one of your GROUP BY columns as the partitioning key.
Off the top of my head I would start with
INDEX (BonType, Del, BonDate)
Or even just
INDEX (BonType, BonDate)
I would recomend using an Index Analyzer, Proflier and Benchmarking various combinations.