I have a table Users:
[UserId] [int] IDENTITY(1,1) NOT NULL,
[UserName] [nvarchar](20) NOT NULL,
[Email] [nvarchar](100) NOT NULL,
[Password] [nvarchar](128) NOT NULL,
[PasswordSalt] [nvarchar](128) NOT NULL,
[Comments] [nvarchar](256) NULL,
[CreatedDate] [datetime] NOT NULL,
[LastModifiedDate] [datetime] NULL,
[LastLoginDate] [datetime] NOT NULL,
[LastLoginIp] [nvarchar](40) NULL,
[IsActivated] [bit] NOT NULL,
[IsLockedOut] [bit] NOT NULL,
[LastLockedOutDate] [datetime] NOT NULL,
[LastLockedOutReason] [nvarchar](256) NULL,
[NewPasswordKey] [nvarchar](128) NULL,
[NewPasswordRequested] [datetime] NULL,
[NewEmail] [nvarchar](100) NULL,
[NewEmailKey] [nvarchar](128) NULL,
[NewEmailRequested] [datetime] NULL
This table has 1 to 1 relation to Profiles:
[UserId] [int] NOT NULL,
[FirstName] [nvarchar](25) NULL,
[LastName] [nvarchar](25) NULL,
[Sex] [bit] NULL,
[BirthDay] [smalldatetime] NULL,
[MartialStatus] [int] NULL
I need to connect user to the all other tables in database so is it better to:
1) Make relations from Users - to other tables?
2) Make relations from Profiles - to other tables?
Since the table [Users] contains the Identity value and is therefore where the [UserID] value originates, I would create all the foreign keys back to it. From a performance standpoint, assuming you have your clustered index on both tables set on the [UserID] column there should be very little performance impact.
Technically I suppose the [Users] table could contain more data per row and therefore the index could span more pages and you could have milliseconds difference in lookups, but I think it makes more sense to relate it back to the table that created the [UserID] and is similarly named. That said, you can really do either.
If the PK of Profiles is a FK to Users, I would maintain consistency and use Users as the parent table in other relationships across the database.
However, if it is a true one-to-one and not a one-to-zero or one relationship, it doesn't matter.
Another consideration is how the data in this database is accessed by any applications. Do the applications use an OR/M like Entity Framework which is aware of FK relationships? If so, consider using whichever table has columns which will most commonly be accessed by queries based on the child tables. For example, an application might display Profiles.LastName and Profiles.FirstName all over the place and very rarely read anything from the Users table. In this situation, you will save your database some I/O and save your developers some keystrokes by building relationships off the Profiles table.
Related
Is there someone who can help me figure out why I cannot query an external table that I created using my SQL Server Mgt Studio. I can see the external table if I expand External Tables but if I Right click and Select Top 1000 Rows I get an error that Invalid object name 'dbo.AuditLogSource'.
I am trying to copy a certain amount of data from an audit log table in DB1.AuditLog into ArchiveDB.AuditLog. I've followed the tutorials on how to use Elastic Queries to archive this simple task but I am now stuck at this point where I should query from the external table created locally in my ArchiveDB. Here's the process I followed maybe I made a mistake somewhere please help me:
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '2019MoxvE!';
--DROP MASTER KEY;
CREATE DATABASE SCOPED CREDENTIAL SQL_Credential
WITH IDENTITY = 'myusername',
SECRET = '2019MoxvE!';
--DROP DATABASE SCOPED CREDENTIAL SQL_Credential;
CREATE EXTERNAL DATA SOURCE RemoteReferenceData
WITH
(
TYPE=RDBMS,
LOCATION='ourserver.database.windows.net',
DATABASE_NAME='DB1',
CREDENTIAL= SQL_Credential
);
--DROP EXTERNAL DATA SOURCE RemoteReferenceData;
CREATE EXTERNAL TABLE [dbo].[AuditLogSource]
(
[Id] [int] NOT NULL,
[Userid] [int] NOT NULL,
[ObjectId] [int] NULL,
[CreatedOn] [datetime] NOT NULL,
[ModifiedOn] [datetime] NOT NULL,
[ModifiedBy] [varchar](150) NOT NULL,
[Type] [int] NOT NULL,
[ActionTable] [varchar](50) NOT NULL,
[IsAjaxRequest] [bit] NOT NULL,
[Parameters] [varchar](max) NOT NULL,
[Controller] [varchar](50) NOT NULL,
[Action] [varchar](50) NOT NULL,
[Comments] [varchar](max) NULL,
[BeforeImage] [varchar](max) NULL,
[AfterImage] [varchar](max) NULL,
[Browser] [varchar](max) NULL
)
WITH (DATA_SOURCE = [RemoteReferenceData]);
--DROP EXTERNAL TABLE [dbo].[AuditLogSource];
INSERT INTO [dbo].[AuditLog]
SELECT al.* FROM [dbo].[AuditLogSource] al WHERE al.[CreatedOn] <= '2020/12/31' AND
NOT EXISTS(SELECT 1 FROM [dbo].[AuditLog] al1 WHERE al1.Id=al.Id);
If you see on below screenshot, you can see that there are no errors being highlighted on this query which means that the query window does recognise that the table AuditLogSource does exists but if I run the query it complains that it does not exists. I can also confirm that the user I am logged into the database with is the admin user and own of both DB1 and ArchiveDB What can I do to make this work?
Thanks in advance.
Make sure you're using the correct database also if you create a new SQL Server object, your newly created object does not get updated in the IntelliSense Local Cache and due to this, it shows an Invalid object name: dbo.AuditLogSource.Please follow below reference.
Ex: [DatabaseName].[Schema].[TableName]
Try:
Edit -> IntelliSense -> Refresh Local Cache or Ctrl + shift + R
Reference:
Sql server invalid object name - but tables are listed in SSMS tables list
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-ver16&tabs=dedicated
Ok so I will post an answer to this question in case another person comes across the same/similar problem. So I only made 1 mistake in creating the External Table and this is because of the tutorials and other answers I saw on this very platform.
CREATE EXTERNAL TABLE [dbo].[AuditLogSource]
(
[Id] [int] NOT NULL,
[Userid] [int] NOT NULL,
[ObjectId] [int] NULL,
[CreatedOn] [datetime] NOT NULL,
[ModifiedOn] [datetime] NOT NULL,
[ModifiedBy] [varchar](150) NOT NULL,
[Type] [int] NOT NULL,
[ActionTable] [varchar](50) NOT NULL,
[IsAjaxRequest] [bit] NOT NULL,
[Parameters] [varchar](max) NOT NULL,
[Controller] [varchar](50) NOT NULL,
[Action] [varchar](50) NOT NULL,
[Comments] [varchar](max) NULL,
[BeforeImage] [varchar](max) NULL,
[AfterImage] [varchar](max) NULL,
[Browser] [varchar](max) NULL
)
WITH
(
DATA_SOURCE = [RemoteReferenceData],
SCHEMA_NAME = 'dbo', -- I missed this part
OBJECT_NAME = 'AuditLog' -- I missed this part
);
So my problem was that I had omitted the SCHEMA_NAME = 'dbo' and OBJECT_NAME = 'AuditLog' which makes a reference to the AuditLog table in DB1. With my OP, Azure was making a reference to AuditLogSource in DB1 which obviously doesn't exist hence I get the error I was getting. BUT, it would help if the query failed in the first place coz that would've highlighted that there was something wrong somewhere. Anyway, I hope this helps someone.
I have created a table which has the columns
Id, name, CountryId,CreatedOn,CreatedBy,updatedOn,UpdatedBy
but when i see the design, there is no CountryId.
When i see script of the created table, it shows CountryId
the script is as follows --
CREATE TABLE [dbo].[State](
[Id] [int] NOT NULL,
[name] [nvarchar](255) NOT NULL,
[CountryId] [int] NOT NULL,
[CreatedOn] [datetime] NULL,
[CreatedBy] [nvarchar](50) NULL,
[UpdatedOn] [datetime] NULL,
[UpdatedBy] [nvarchar](50) NULL,
);
Also, When I right click and select top 100 rows, i find CountryId there. (The following image)
What can be the reason that the design is not showing a column ??
EDIT:
As suggested by shnugo, i was able to solve it after closing and re-opening SSMS 2014
I have the following database design:
TABLE [Document]
[DocumentId] [int] NOT NULL, --Primary Key
[Status] [bit] NULL,
[Text] [nvarchar](max) NULL,
[FolderPath] [nvarchar](max) NULL
TABLE [Metadata]
[MetadataId] [int] IDENTITY(1,1) NOT NULL, -- Primary Key
[DocumentId] [int] NOT NULL, -- Foreign Key Document.DocumentId (1:1 relationship)
[Title] [nvarchar](250) NOT NULL,
[Author] [nvarchar](250) NOT NULL
TABLE [Page](
[PageId] [int] IDENTITY(1,1) NOT NULL, -- Primary Key
[DocumentId] [int] NOT NULL, -- Foreign Key Document.DocumentId (1:N Relationship)
[Number] [int] NOT NULL,
[ImagePath] [nvarchar](max) NULL,
[PageText] [nvarchar](max) NOT NULL
TABLE [Word](
[WordId] [int] IDENTITY(1,1) NOT NULL, -- Primary Key
[PageId] [int] NOT NULL, -- Foreign Key Page.PageId (1:N Relationship)
[Text] [nvarchar](50) NOT NULL
TABLE [Keyword](
[KeywordId] [int] IDENTITY(1,1) NOT NULL, -- Primary Key
[Word] [nvarchar](50) NOT NULL
TABLE [DocumentKeyword](
[Document_DocumentId] [int] NOT NULL, -- Foreign Key Document.DocumentId (N:N Relationship)
[Keyword_KeywordId] [int] NOT NULL -- Foreign Key Keyword.KeywordId
I'm using Entity Framework Code First to create the database.
Should I be normalizing my database design further? i.e. creating link tables between Document and Page, Document and Metadata, etc.? If so, is there a way to get the Entity Framework to do create the relationship tables for me, so that I don't have to include them in my models? I'm trying to learn to do this the right and most efficient way possible.
Thank you.
Well I can't immediately answer your question, but I have some thoughts that might improve your design:
A document (in real life, at least) can be written by more than one
author. This means, that your 1:1 relationship from Document to
Metadata should be a 1:n relationship (unless you can prove that
there will never be a situation that there's more than one author)
The title of a document is (in my view) more a property of the document than a piece of metadata (also having 1. in mind)
What does this Word table do?
The column Keyword_KeywordId should be called plainly KeywordId if you want to be consistent in your naming. The same applies to
Document_DocumentId.
For the rest it looks pretty normalized
I've got a client portal project (the first one I've developed so a basic best practice is what I'm looking for here, nothing fancy) nearing first release.
A simplification of the main record types used in reporting is the following:
CREATE TABLE [dbo].[conversions](
[conversion_id] [nvarchar](128) primary key NOT NULL,
[click_id] [int] NULL,
[conversion_date] [datetime] NOT NULL,
[last_updated] [datetime] NULL,
[click_date] [datetime] NULL,
[affiliate_affiliate_id] [int] NOT NULL,
[advertiser_advertiser_id] [int] NOT NULL,
[offer_offer_id] [int] NOT NULL,
[creative_creative_id] [int] NOT NULL,
[conversion_type] [nvarchar](max) NULL)
CREATE TABLE [dbo].[clicks](
[click_id] [int] primary key NOT NULL,
[click_date] [datetime] NOT NULL,
[affiliate_affiliate_id] [int] NOT NULL,
[advertiser_advertiser_id] [int] NOT NULL,
[offer_offer_id] [int] NOT NULL,
[campaign_id] [int] NOT NULL,
[creative_creative_id] [int] NOT NULL,
[ip_address] [nvarchar](max) NULL,
[user_agent] [nvarchar](max) NULL,
[referrer_url] [nvarchar](max) NULL,
[region_region_code] [nvarchar](max) NULL,
[total_clicks] [int] NOT NULL)
My specific question is: given millions of rows in each table, what mechanism is used to serve up summary reports quickly on demand given you know all the possible reports that can be requested?
The starting point, performance wise, doing raw queries against a 18 months worth of data for the busiest client is yielding a 3 to 5 second latency on my dashboard and the worst case is upwards of 10 seconds for a summary report with a custom date range spanning all the rows.
I know I can cache them after the first hit, but I want snappy performance on the first hit.
My feeling is this is a fundamental aspect of an application of this nature and that there are tons of applications like this out there, so is there an already well-thought-out method to pre-calculating tables that already did the grouping and aggregation? Then how do you keep them up to date? Do you use SQL agent and custom console apps that brute force the calculations before hand?
Any general pointers would be very appreciated..
Both tables are time series. They seem to be clustered by an ID column which has little value for how time series are queried. Time series are almost always queried by date range, so your clustered organization should service this type of queries first and foremost: cluster by date, move the ID primary key constraint into a non-clustered.
CREATE TABLE [dbo].[conversions](
[conversion_id] [nvarchar](128) NOT NULL,
[conversion_date] [datetime] NOT NULL,
...
constraint pk_conversions nonclustered primary key ([conversion_id]))
go
create clustered index [cdx_conversions] on [dbo].[conversions]([conversion_date]);
go
CREATE TABLE [dbo].[clicks](
[click_id] [int] NOT NULL,
[click_date] [datetime] NOT NULL,
...
constraint [pk_clicks] nonclustered [click_id]);
go
create clustered index [cdx_clicks] on [dbo].[clicks]([click_date]);
This model will serve the typical queries that filter by a range on [click_date] and on [conversion_date]. For any other query the answer will be very specific to your query.
There are limits on how useful a relational row organized model can be for an OLAP/DW workload like yours. Specialized tools do a better job at it. Columnstore indexes can deliver amazingly fast responses, but they are difficult to update. Creating a MOLAP cube can also deliver blazing results but that is a serious project undertaking. There are even specialized time series databases out there.
I have the following table
CREATE TABLE [dbo].[LogFiles_Warehouse](
[id] [int] IDENTITY(1,1) NOT NULL,
[timestamp] [datetime] NOT NULL,
[clientNr] [int] NOT NULL,
[server] [nvarchar](150) COLLATE Latin1_General_CI_AS NOT NULL,
[storeNr] [int] NOT NULL,
[account] [nvarchar](50) COLLATE Latin1_General_CI_AS NOT NULL,
[software] [nvarchar](300) COLLATE Latin1_General_CI_AS NOT NULL,
CONSTRAINT [PK_Astoria_LogFiles_Warehouse] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
And want to avoid having duplicate rows in my table. I thought about creating a UNIQUE index on the complete table, but then SQL Manager Studio tells me that this is not possible because the key would be too large.
Is there another way I could enforce unique rows over all columns, apart from indexes?
Create a UNIQUE index on hashed values:
CREATE TABLE [dbo].[LogFiles_Warehouse]
(
[id] [int] IDENTITY(1,1) NOT NULL,
[timestamp] [datetime] NOT NULL,
[clientNr] [int] NOT NULL,
[server] [nvarchar](150) COLLATE Latin1_General_CI_AS NOT NULL,
[storeNr] [int] NOT NULL,
[account] [nvarchar](50) COLLATE Latin1_General_CI_AS NOT NULL,
[software] [nvarchar](300) COLLATE Latin1_General_CI_AS NOT NULL,
serverHash AS CAST(HASHBYTES('MD4', server) AS BINARY(16)),
accountHash AS CAST(HASHBYTES('MD4', account) AS BINARY(16)),
softwareHash AS CAST(HASHBYTES('MD4', software) AS BINARY(16))
)
CREATE UNIQUE INDEX
UX_LogFilesWarehouse_Server_Account_Software
ON LogFiles_Warehouse (serverHash, accountHash, softwareHash)
Use triggers + a smaller non unique index over the most distinguishing ields to helop aleviate the table s can problem.
This goes down a lot into a bad database design to start with. Fields like Software, Account do not belong into that table to start with (or if account, then not client nr). Your table is only so wisde because you arelady violate database design basics to start with.
Also, to abvoid non unique fields, you have NT to have the Id field in the unique testing otherwise you ont ever have doubles to start with.