I am a bit rusty with my SQL since I have not worked with it beyond basic querying of existing databases that were already setup.
I am trying to create an event logging database, and want to take a "extreme" approach to normalization. I would have a main table comprised of mostly 'smallint' fields that point to child tables which contain strings.
Example:
I have an external system that i would like to enable some logging in via SQL, user fills in some key parameters which build and insert/update statement and gets pushed to the logging tables so they can be viewed at a later time if they need to know what XYZ value was at runtime, or sometime in the past.
I have a main table which consists of:
SELECT [log_id] - bigint (auto-increment) PK
,[date_time] - smalldatetime
,[cust_id] - smallint FK
,[recloc] - char(8)
,[alert_level] - smallint FK
,[header] - varchar(100)
,[body] - varchar(1000)
,[process_id] - smalint FK
,[routine_id] - smallint FK
,[workflow_id] - smallint FK
FROM [EventLogs].[dbo].[eventLogs]
All of the 'smallint' field point to a child table which contains the expanded data:
Example:
SELECT [routine_id] PK/FK
,[routine_name]
,[description]
FROM [EventLogs].[dbo].[cpRoutine]
SELECT [process_id] PK/FK
,[process_name]
,[description]
FROM [EventLogs].[dbo].[cpProcess]
My goal here, is to have the external system do an update/insert statement that reaches all these tables. I have all the 'smallint' fields linked up as FK's currently.
How do i go about crafting the update/insert statements that touches all these tables? If a child table already contains a key-value pair, i do not want to touch it. The idea of the child tables is to house repetitive data there and assign it a key in the main logging table to keep size down. Do i need to check for existence of a records in child tables, save the index number, then build my insert statement for the main table? Trying to be as efficient as possible here.
Example:
I want to log the following from the external system:
- date_time - GETDATE()
- customer_number - '0123456789'
- recloc - 'ABC123'
- alert_level - 'info'
- header - 'this is a header'
- body - 'this is a body'
- process_name - 'the process'
- routine_name - 'the routine'
- workflow_name - 'the workflow'
Do I need to create my insert statement for the main table (eventLogs) but check each child table first and add missing values, then save the id for my insert statement in the main table?
Select process_id, process_name From cpProcess where process_name = 'the process'
If no values returned, do an insert statement with the process_name
Now query the table again to get the ID so i can build the "main insert statement" that feeds the master log table
Repeat for all other child tables
final insert statement looks something like:
SQL code:
INSERT INTO eventLogs (date_time, cust_id, recloc, alert_level, header, body, process_id, routine_id, workflow_id)
VALUES('2017-12-31', '1', 'ABC123', '3', 'this is a header', 'this is a body', '13', '19', '12')
It just seems like i am doing too much back and forth with the server checking for values in the child tables to do my insert....
The end goal here is to create a friendly view that pulls in all the data assigned to the 'smallint' keys.
You're close:
Select process_id from cpProcess where process_name = 'the process'
If no values returned, do an insert statement with the process_name, get ID through IDENT_CURRENT, SCOPE_IDENTITY, or IDENTITY (or use a subordinate "load" procedure and get the ID from an output parameter).
Repeat for each child table until you get the values required to do your final insert into [eventLogs].
This works fine if it is a relatively low speed process. As you increase the speed you can have issues, but if you are doing INSERT only, as you should, it still isn't terrible. I've used SQL Server Service Broker in the past to decouple processes such as these to improve performance, but that obviously adds complexity.
Depending on the load you might also decide to build aggregate tables in a fact/dimension star so that the INSERT OLTP process is segregated from the SELECT OLAP process.
What you're seeing is the complexity involved in building a normalized data structure. You're approach "to take a "extreme" approach to normalization" is often bypassed because it's "too hard". That doesn't mean you shouldn't do it, but you should weigh the ROI. I have made decisions to just dump everything into a log table such as this below in the past where there were only going to be perhaps less than ten thousand records at any given time. You just have to look at the requirements and make the best choice.
CREATE TABLE [log].[data]
(
[id] INT IDENTITY(1, 1)
, [timestamp] DATETIME DEFAULT sysdatetime()
, [entry] XML NOT NULL
);
One option that I frequently use during the build out phase of a design is to build placeholders behind adapters as shown below. Use the getter and setter methods ALWAYS and later, when you need better performance or data storage, you can refactor the underlying data structure as required, modify the adapters to the new data structures, and you've saved yourself some time. Otherwise you can end up chasing a lot of rabbits down holes early in the project. Often you'll find that your design for the underlying structures changes based on requirements as the project moves forward and you'd have spent a lot of time on changes. Using this approach you get a working mechanism in place immediately.
Later on if you need to collapse this structure to provide better performance it will be trivial compared to constantly changing the structure during design (in my opinion).
Oh, and yes, you could use a standard relational table. I use a lot of XML in applications and event logging because it allows ad hoc structured data. The concept is the same. You could use your top level table, just with the [process_name], etc. columns directly in the table and no child columns for now.
Just remember you should NOT allow access to the underlying tables directly! One way to prevent this is to actually put them in a dedicated schema such as [log_secure], and secure that schema to all but admin and the accessor/mutator methods.
IF schema_id(N'log') IS NULL
EXECUTE (N'CREATE SCHEMA log');
go
IF object_id(N'[log].[data]', N'U') IS NOT NULL
DROP TABLE [log].[data];
go
CREATE TABLE [log].[data]
(
[id] BIGINT IDENTITY(1, 1)
, [timestamp] DATETIMEOFFSET NOT NULL -- DATETIME if timezone isn't needed
CONSTRAINT [log__data__timestamp__df] DEFAULT sysdatetimeoffset()
, [entry] XML NOT NULL,
CONSTRAINT [log__data__id__pk] PRIMARY KEY CLUSTERED ([id])
);
IF object_id(N'[log].[get_entry]', N'P') IS NOT NULL
DROP PROCEDURE [log].[get_entry];
go
CREATE PROCEDURE [log].[get_entry] #id BIGINT
, #entry XML output
, #begin DATETIMEOFFSET
, #end DATETIMEOFFSET
AS
BEGIN
SELECT #entry
FROM [log].[data]
WHERE [id] = #id;
END;
go
IF object_id(N'[log].[set_entry]', N'P') IS NOT NULL
DROP PROCEDURE [log].[set_entry];
go
CREATE PROCEDURE [log].[set_entry] #entry XML
, #timestamp DATETIMEOFFSET = NULL
, #id BIGINT output
AS
BEGIN
INSERT INTO [log].[entry]
([timestamp]
, [entry])
VALUES ( COALESCE(#timestamp, sysdatetimeoffset()),#entry );
SET #id = SCOPE_IDENTITY();
END;
go
Related
Suppose a table in SQLServer with this structure:
TABLE t (Id INT PRIMARY KEY)
Then I have a stored procedure, which is constantly being called, that works inserting data in this table among other kind of things:
BEGIN TRAN
DECLARE #Id INT = SELECT MAX(Id) + 1 FROM t
INSERT t VALUES (#Id)
...
-- Stuff that gets a long time to get completed
...
COMMIT
The problem with this aproach is sometimes I get a primary key violation because 2 or more procedure calls get and try to insert the same Id on the table.
I have been able to solve this problem adding a tablock in the SELECT sentence:
DECLARE #Id INT = SELECT MAX(Id) + 1 FROM t WITH (TABLOCK)
The problem now is sucessive calls to the procedure must wait to the completion of the transaction currently beeing executed to start their work, allowing just one procedure to run simultaneosly.
Is there any advice or trick to get the lock just during the execution of the select and insert sentence?
Thanks.
TABLOCK is a terrible idea, since you're serialising all the calls (no concurrency).
Note that with an SP you will retain all the locks granted over the run until the SP completes.
So you want to minimise locks except for where you really need them.
Unless you have a special case, use an internally generated id:
CREATE TABLE t (Id INT IDENTITY PRIMARY KEY)
Improved performance, concurrency etc. since you are not dependent on external tables to manage the id.
If you have existing data you can (re)set the start value using DBCC
DBCC CHECKIDENT ('t', RESEED, 100)
If you need to inject rows with a value preassigned, use:
SET IDENTITY_INSERT t ON
(and off again afterwards, resetting the seed as required).
[Consider whether you want this value to be the primary key, or simply unique.
In many cases where you need to reference a tables PK as a FK then you'll want it as PK for simplicity of join, but having a business readable value (eg, Accounting Code or OrderNo+OrderLine is completely valid) : that's just modelling]
I want to develop a BizTalk orchestration. Which should insert multiple records into multiple DB tables and retrieve inserted records from multiple DB tables, in single instance of orchestration. For this requirement, I'm able to insert the data in one instance, but seeing difficulty to retrieve the inserted data for that instance, as all the records has unique values for each record. For my situation, I should use stored procedures, to apply some other business logic. So I have 2 different methods by using "Wcf_Custom Adapter composite feature" by calling stored procedures, as stated below.
-> Method1
I have to develop a Stored procedure, which takes LoadDate("2016-05-12 10:11:22.147") as parameter along with inserting values and it will take care of inserting the records for that instance, by keeping the given LoadDate. Then immediately it will call Get stored procedure, which takes the LoadDate("2016-05-12 10:11:22.147") as parameter, then it will retrieve the recently inserted records from DB based on LoadDate value.
I know, Retrieving the data based on a date value from sql server is a bad practice and it will give performance issues too.
-> Method2
I'll design the inserting tables, with bool data type column name "New" and value will be 0 or 1. I'll develop a Insert Stored procedure, which inserts the data by giving the "New" column value as "1". Then immediately it will call Get stored procedure, which will not take no parameters, then it will retrieve the recently inserted records which are having "New" column indicator "1" from DB tables. Once it retrieves the data, then it will update "New" column value to "0".
I prefer this method2. But, do we have better option?
As #johns-305 mentioned in his comment. You shall use table value param in your sp. and assembly all your data in orchestration then make a call to this sp.
A sample sp may like below:
CREATE TYPE [dbo].[SampleDataTable_Type] AS TABLE(
[ID] [int] NOT NULL,
[Name] [varchar](50) NOT NULL,
PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (IGNORE_DUP_KEY = OFF)
)
GO
CREATE PROCEDURE [dbo].[sp_InsertSampleTableData]
(
#LoadDate DATETIME,
#data [SampleDataTable_Type] READONLY
)
AS
BEGIN
SET NOCOUNT ON
INSERT INTO your_table(id, name,)
SELECT id, name FROM #data;
--Do whatever you want
SET NOCOUNT OFF
END
GO
I think your stored procedure may look like this:
create procedure myProc
#a int, #b varchar(100)
as
insert myTable(a,b,c)
OUTPUT inserted.* --this line
select a,b,c
from somewhere
where a=#a and b=#b
Is there any possibility to disable auto creating statistics on specific table in database, without disabling auto creating statistics for entire database?
I have a procedure wich written as follow
create proc
as
create table #someTempTable(many columns, more than 100)
inserting into #someTempTable **always one or two row**
exec proc1
exec proc2
etc.
proc1, proc2 .. coontains many selects and updates like this:
select ..
from #someTempTable t
join someOrdinaryTable t2 on ...
update #someTempTable set col1 = somevalue
Profiler shows that before each select server starts collecting stats in #someTempTable, and it takes more than quarter of entire execution of proc. Proc is using in OLPT processing and should works very fast. I want to change this temporary table to table variable(because for table variables server doesn't collect stats) but can't because it lead me to rewrite all this procedures to passing variables between them and all of this legacy code should be retests. I'm searching alternative way how to force server to behave temporary table like table variables in part of collecting stats.
P.S. I'm know that stats is useful thing but in this case it's useless because table alway contains small amount of records.
I assume you know what you are doing. Disabling a statistics is generally a bad idea. Anyhow:
EXEC sp_autostats 'table_name', 'OFF'
More documentation here: https://msdn.microsoft.com/en-us/library/ms188775.aspx.
Edit: OP clarified that he wants to disable statistics for a temp table. Try this:
CREATE TABLE #someTempTable
(
ID int PRIMARY KEY WITH (STATISTICS_NORECOMPUTE = ON),
...other columns...
)
If you don't have a primary key already, use an identity column for a PK.
Backstory
At work where we're planning on deprecating a Natural Key column in one of our primary tables. The project consists of 100+ applications that link to this table/column; 400+ stored procedures that reference this column directly; and a vast array of common tables between these applications that also reference this column.
The Big Bang and Start from Scratch methods are out of the picture. We're going to deprecate this column one application at a time, certify the changes, and move on to the next... and we've got a lengthy target goal to make this effort practical.
The problem I have is that a lot of these applications have shared stored procedures and tables. If I completely convert all of Application A's tables/stored procedures Application B and C will be broken until converted. These in turn may break applications D, E, F...Etc. I've already got a strategy implemented for Code classes and Stored Procedures, the part I'm stuck on is the transitioning state of the database.
Here's a basic example of what we have:
Users
---------------------------
Code varchar(32) natural key
Access
---------------------------
UserCode varchar(32) foreign key
AccessLevel int
And we're aiming now just for transitional state like this:
Users
---------------------------
Code varchar(32)
Id int surrogate key
Access
---------------------------
UserCode varchar(32)
UserID int foreign key
AccessLevel int
The idea being during the transitional phase un-migrated applications and stored procedures will still be able to access all the appropriate data and new ones can start pushing to the correct columns -- Once the migration is complete for all stored procedures and applications we can finally drop the extra columns.
I wanted to use SQL Server's triggers to automatically intercept any new Insert/Update's and do something like the following on each of the affected tables:
CREATE TRIGGER tr_Access_Sync
ON Access
INSTEAD OF INSERT(, UPDATE)
AS
BEGIN
DIM #code as Varchar(32)
DIM #id as int
SET #code = (SELECT inserted.code FROM inserted)
SET #id = (SELECT inserted.code FROM inserted)
-- This is a migrated application; find the appropriate legacy key
IF #code IS NULL AND #id IS NOT NULL
SELECT Code FROM Users WHERE Users.id = #id
-- This is a legacy application; find the appropriate surrogate key
IF #id IS NULL AND #code IS NOT NULL
SELECT Code FROM Users WHERE Users.id = #id
-- Impossible code:
UPDATE inserted SET inserted.code=#code, inserted.id=#id
END
Question
The 2 huge problems I'm having so far are:
I can't do an "AFTER INSERT" because NULL constraints will make the insert fail.
The "impossible code" I mentioned is how I'd like to cleanly proxy the original query; If the original query has x, y, z columns in it or just x, I ideally would like the same trigger to do these. And if I add/delete another column, I'd like the trigger to remain functional.
Anyone have a code example where this could be possible, or even an alternate solution for keeping these columns properly filled even when only one of values is passed to SQL?
Tricky business...
OK, first of all: this trigger will NOT work in many circumstances:
SET #code = (SELECT inserted.code FROM inserted)
SET #id = (SELECT inserted.code FROM inserted)
The trigger can be called with a set of rows in the Inserted pseudo-table - which one are you going to pick here?? You need to write your trigger in such a fashion that it will work even when you get 10 rows in the Inserted table. If a SQL statement inserts 10 rows, your trigger will not be fired ten times - one for each row - but only once for the whole batch - you need to take that into account!
Second point: I would try to make the ID's IDENTITY fields - then they'll always get a value - even for "legacy" apps. Those "old" apps should provide a legacy key instead - so you should be fine there. The only issue I see and don't know how you handle those are inserts from an already converted app - do they provide an "old-style" legacy key as well? If not - how quickly do you need to have such a key?
What I'm thinking about would be a "cleanup job" that would run over the table and get all the rows with a NULL legacy key and then provide some meaningful value for it. Make this a regular stored procedure and execute it every e.g. day, four hours, 30 minutes - whatever suits your needs. Then you don't have to deal with triggers and all the limitations they have.
Wouldn't it be possible to make the schema changes 'bigbang' but create views over the top of those tables that 'hide' the change?
I think you might find you are simply putting off the breakages to a later point in time: "We're going to deprecate this column one application at a time" - it might be my naivety but I can't see how that's ever going to work.
Surely, a worse mess can occur when different applications are doing things differently?
After sleeping on the problem, this seems to be the most generic/re-usable solution I could come up with within the SQL Syntax. It works fine even if both columns have a NOT NULL restraint, even if you don't reference the "other" column at all in your insert.
CREATE TRIGGER tr_Access_Sync
ON Access
INSTEAD OF INSERT
AS
BEGIN
/*-- Create a temporary table to modify because "inserted" is read-only */
/*-- "temp" is actually "#temp" but it throws off stackoverflow's syntax highlighting */
SELECT * INTO temp FROM inserted
/*-- If for whatever reason the secondary table has it's own identity column */
/*-- we need to get rid of it from our #temp table to do an Insert later with identities on */
ALTER TABLE temp DROP COLUMN oneToManyIdentity
UPDATE temp
SET
UserCode = ISNULL(UserCode, (SELECT UserCode FROM Users U WHERE U.UserID = temp.UserID)),
UserID = ISNULL(UserID, (SELECT UserID FROM Users U WHERE U.UserCode = temp.UserCode))
INSERT INTO Access SELECT * FROM temp
END
We have a table that will store versions of records.
The columns are:
Id (Guid)
VersionNumber (int)
Title (nvarchar)
Description (nvarchar)
etc...
Saving an item will insert a new row into the table with the same Id and an incremented VersionNumber.
I am not sure how is best to generate the sequential VersionNumber values. My initial thought is to:
SELECT #NewVersionNumber = MAX(VersionNumber) + 1
FROM VersionTable
WHERE Id = #ObjectId
And then use the the #NewVersionNumber in my insert statement.
If I use this method do I need set my transaction as serializable to avoid concurrency issues? I don't want to end up with duplicate VersionNumbers for the same Id.
Is there a better way to do this that doesn't make me use serializable transactions?
In order to avoid concurrency issues (or in your specific case duplicate inserts) you could create a Compound Key as the Primary Key for your table, consisting of the ID and VersionNumber columns. This would then enforce a unique constraint on the key column.
Subsequently your insert routine/logic can be devised to handle or rather CATCH an insert error due to a duplicate key and then simply re-issue the insert process.
It may also be worth mentioning that unless you specifically need to use a GUID i.e. because of working with SQL Server Replication or multiple data sources, that you should consider using an alternative data type such as BIGINT.
I had thought that the following single insert statement would avoid concurrency issues, but after Heinzi's excellent answer to my question here it turns out that this is not safe at all:
Insert Into VersionTable
(Id, VersionNumber, Title, Description, ...)
Select #ObjectId, max(VersionNumber) + 1, #Title, #Description
From VersionTable
Where Id = #ObjectId
I'm leaving it just for reference. Of course this would work with either table hints or a transaction isolation level of Serializable, but overall the best solution is to use a constraint.