I am trying to migrate a database AdventureWorks using Polybase to SQL Server datawarehouse.
Suppose I have a schema HumanResources and a table Department in that schema.
CREATE TABLE [HumanResources].[Department]
(
[DepartmentID] [smallint] NOT NULL,
[Name] [nvarchar](50) NOT NULL,
[ModifiedDate] [datetime] NOT NULL
)
I need to create an external table for the data of [HumanResources].[Department] before loading the data from Azure blob into SQL Server datawarehouse.
CREATE EXTERNAL TABLE ex.TableName
(
[DepartmentID] [smallint] NOT NULL,
[Name] [nvarchar](50) NOT NULL,
[ModifiedDate] [datetime] NOT NULL
)
WITH (
LOCATION='/path/',
DATA_SOURCE=AzureStorage,
FILE_FORMAT=TextFile
);
I am creating all external tables under an [ex] schema, how should I represent the original schema to avoid collisions.
I cannot do [ex].[HumanResources].[Department] and I would like to avoid creating unnecessary schemas for external tables.
Is there an easy way of representing this?
A common pattern we see is to simply add _ext to the end of the table name. So following your example you'd have the following:
[HumanResources].[Department]
[HumanResources].[Department_ext]
Related
This is a question more about design than about solving a problem.
I created three tables as such
CREATE TABLE [CapInvUser](
[UserId] [int] IDENTITY(1,1) NOT NULL,
[Name] [varchar](150) NOT NULL,
[AreaId] [int] NULL,
[Account] [varchar](150) NULL,
[mail] [varchar](150) NULL,
[UserLevelId] [int] NOT NULL
)
CREATE TABLE [CapInvUserLevel](
[UserLevelId] [int] IDENTITY(1,1) NOT NULL,
[Level] [varchar](50) NOT NULL
)
CREATE TABLE [CapInvUserRegistry](
[UserRegistryId] [int] IDENTITY(1,1) NOT NULL,
[UserLevelId] int NOT NULL,
[DateRegistry] DATE NOT NULL,
[RegistryStatus] VARCHAR(50) NOT NULL,
)
With a view that shows all the data on the first table with "AreaId" being parsed as the varchar identifier of that table, the UserLevel being parsed as the varchar value of that table, and a join of the registry status of the last one.
Right now when I want to register a new user, I insert into all three tables using separate queries, but I feel like I should have a way to insert into all of them at the same time.
I thought about using a stored procedure to insert, but I still don't know if that would be apropiate.
My question is
"Is there a more apropiate way of doing this?"
"Is there a way to create a view that will let me insert over it? (without passing the int value manually)"
--This are just representations of the tables, not the real ones.
-- I'm still learning how to work with SQL Server properly.
Thank you for your answers and/or guidance.
The most common way of doing this, in my experience, is to write a stored procedure that does all three inserts in the necessary order to create the FK relationships.
This would be my unequivocal recommendation.
I am looking for some advise. I have a SQL Server table called AuditLog and this table records any action/changes that happens to our DB from our web application.
I am trying to build some reports and anytime I try to pull data from this table it makes my query run from seconds to 10mins+. Just doing a
select * from dbo.auditlog
takes about 2hours+.
The table has 77 million rows and is growing. Anyhow, only thoughts at this moment is to do an index but that would slow down inserts. Not sure how much that would affect performance but have held back on it. Other thoughts were to partition the table or do an index view but we are running SQL Server 2014 Standard Edition and those options are not supported.
Here is the table create statement:
CREATE TABLE [dbo].[AuditLog]
(
[AuditLogId] [uniqueidentifier] NOT NULL,
[UserId] [uniqueidentifier] NULL,
[EventDateUtc] [datetime] NOT NULL,
[EventType] [char](1) NOT NULL,
[TableName] [nvarchar](100) NOT NULL,
[RecordId] [nvarchar](100) NOT NULL,
[ColumnName] [nvarchar](100) NOT NULL,
[OriginalValue] [nvarchar](max) NULL,
[NewValue] [nvarchar](max) NULL,
[Rams1RecordID] [uniqueidentifier] NULL,
[Rams1AuditHistoryID] [uniqueidentifier] NULL,
[Rams1UserID] [uniqueidentifier] NULL,
[CreatedBy] [uniqueidentifier] NULL,
[CreatedDate] [datetime] NULL DEFAULT (getdate()),
[OriginalValueNiceName] [nvarchar](100) NULL,
[NewValueNiceName] [nvarchar](100) NULL,
CONSTRAINT [PK_AuditLog]
PRIMARY KEY CLUSTERED ([TableName] ASC, [RecordId] ASC, [AuditLogId] ASC)
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[AuditLog] WITH NOCHECK
ADD CONSTRAINT [FK_AuditLog_User]
FOREIGN KEY([UserId]) REFERENCES [dbo].[User] ([UserID])
GO
ALTER TABLE [dbo].[AuditLog] CHECK CONSTRAINT [FK_AuditLog_User]
GO
ALTER TABLE [dbo].[AuditLog] WITH NOCHECK
ADD CONSTRAINT [FK_AuditLog_UserCreatedBy]
FOREIGN KEY([CreatedBy]) REFERENCES [dbo].[User] ([UserID])
GO
ALTER TABLE [dbo].[AuditLog] CHECK CONSTRAINT [FK_AuditLog_UserCreatedBy]
GO
With something that big there are a couple of things you might try.
The first thing you need to do is define how you accessing the table MOST of the time and index accordingly.
I would hope you are not do a select * from AuditLog without any filtering for a reporting solution - it shouldn't even be an option.
Finally, instead of indexed views or partitioning, you might consider a partitioned view.
A partitioned view is basically breaking your table up, physically into smaller meaningful tables - based on date or type or object or however you are MOST often accessing it. Each table is then indexed separately giving you much better stats and if you in 2012 or higher you can take advantage of ColumnStore, assuming you use something like a DATE to group the data.
Create a view that spans all of the tables and then report based on the view. Since you already grouped your data by how you MOST often will access it, your filter will act similarly to partition exclusion and get you to your data faster.
Of course this will result in a little more maintenance and some code change, but be well worth the effort if you are storing that much data and more in a single table.
I'm developing a SQL SERVER 2012 express and developer solution.
I will receive an xml in an stored procedure. In the stored procedure I will parse the xml and insert its data into a table.
My problem here is that in this xml could contain data that exists on the table, and I need to update the data on the table with the new one.
I don't want to check if each row in xml exists on the table.
I think I can use IGNORE_DUP_KEY but I'm not sure.
How can I update or insert new data without checking it?
This is the table where I want to insert (or update) the new data:
CREATE TABLE [dbo].[CODES]
(
[ID_CODE] [bigint] IDENTITY(1,1) NOT NULL,
[CODE_LEVEL] [tinyint] NOT NULL,
[CODE] [nvarchar](20) NOT NULL,
[COMMISIONING_FLAG] [tinyint] NOT NULL,
[IS_TRANSMITTED] [bit] NOT NULL,
[TIMESPAN] [datetime] NULL,
[USERNAME] [nvarchar](50) NULL,
[SOURCE] [nvarchar](50) NULL,
[REASON] [nvarchar](200) NULL
CONSTRAINT [PK_CODES] PRIMARY KEY CLUSTERED
(
[CODE_LEVEL] ASC,
[CODE] ASC
)
)
The "IGNORE_DUP_KEY" parameter ,is ignore inserting new row, if he is already exists, but it is not dealing with update in case it exists.
the solution to your request is by MERGE or DML operation (INSERT/UPDATE/DELETE) .
BTW,
The parameter "IGNORE_DUP_KEY" is covering existsnce for the index key only (index column).
I'd like to sync an on-premise SQL Server 2012 SP2 database to Azure using SQL Data Sync.
When I try to do the sync I get "Unsupported Data Type" error on one of the tables for the ID_Index column:
The Azure Management Portal gives no further explanantion for the error.
The table design in SQL Server Management Studio:
The table creation script:
CREATE TABLE [dbo].[FlightPlanData](
[ID] [uniqueidentifier] NOT NULL CONSTRAINT [DF_FlightPlanData_ID] DEFAULT (newid()),
[Airline_ID] [int] NOT NULL,
[FlightID_FK] [uniqueidentifier] NOT NULL,
[FlightPlanID] [int] NOT NULL,
[DateInserted] [datetime] NOT NULL CONSTRAINT [DF_FlightPlanData_DateInserted] DEFAULT (getdate()),
[Type] [varchar](20) NOT NULL CONSTRAINT [DF_FlightPlanData_Type] DEFAULT (''),
[FileName] [varchar](100) NOT NULL CONSTRAINT [DF_FlightPlanData_FileName] DEFAULT (''),
[ClientID_FK] [uniqueidentifier] NULL,
[ID_Index] [int] IDENTITY(1,1) NOT NULL,
CONSTRAINT [PK_FlightPlanData] PRIMARY KEY NONCLUSTERED ([ID] ASC))
CREATE CLUSTERED INDEX [IX_FlightPlanData] ON [dbo].[FlightPlanData]([ID_Index] ASC)
The table has a GUID primary key, but it's not clustered, instead we use a clustered in index (ID_Index).
I can't remove the ID_Index column, and I'd prefer not to make it the primary key. Is there any way to solve this?
I heard Azure requires a clustered index for each table, but it doesn't have to be the primary key. So what's the problem here?
A table cannot have an identity column that is not the primary key. This is one of the general requirements of SQL Data Sync. For more information, please visit this GitHub documentation.
I have 2 tables called login and Roles.
In the login table, I have these fields:
CREATE TABLE [dbo].[login]
([Id] [int] IDENTITY(1,1) NOT NULL,
[Uname] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[Pwd] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
CONSTRAINT [PK_login_1] PRIMARY KEY CLUSTERED([Uname] ASC)
In the roles table I have these fields:
CREATE TABLE [dbo].[Roles]
([Uname] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[Valid] [int] NOT NULL
)
Now what I need is if I fill the uname as some xyz I would like to fill the same uname in the role table automatically in the corresponding field that i makes as foreign key...
You could do this using a Trigger. You may or may not want to execute this code on an Insert and / or Update Further details on triggers can be found here
CREATE TRIGGER trgInsertUserIntoRoles ON Login
FOR Insert
AS
INSERT INTO Roles (UName, Valid)
SELECT Uname, 1
FROM Inserted
Although I think it would be better if you just added the code to insert the username into the Roles table within the Stored Procedure to create the user.
Also, you are aware that you are creating all this on the master database?
A solution is to put a trigger on inserts to the original table.
This microsoft article on triggers will tell you how they work.