SQL Server - Computed column counting over partition - sql-server

I am fairly new to creating tables in SQL Server - Especially to computed columns - And am looking to make sure that I'm not creating a terribly inefficient database.
As a simplified example of what I'm trying to accomplish, suppose I have the following table definition:
CREATE TABLE [dbo].[MyTable](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Name] [varchar](50) NOT NULL,
[Strategy] [varchar](30) NOT NULL,
[Strategy_Variation] [int] NOT NULL
)
With simplified data looking as follows:
ID: Name: Strategy: Strategy_Variation:
1 Name1 Strat1 1
2 Name2 Strat2 1
3 Name3 Strat2 2
4 Name4 Strat1 2
5 Name5 Strat1 3
Basically, my question comes about regarding the Strategy_Variation column. What I would like to have happen would be, for each Strategy, increment the variation based upon order entered into the table (using the ID incrementing identity index as the way to order the entries). Or, via SQL:
COUNT(Strategy) over(partition by Strategy order by ID)
My question is whether this is or is not a good idea to have this as a computed column in my table definition, or if I simply leave this kind of column out completely and add it into a view, say, to keep this table leaner.
Overall, I'm a newbie to this and would love any pointers as to how a seasoned DB admin would handle such a situation.
Thanks!

Related

Indexing columns in SQL Server

I have the following table
CREATE TABLE [dbo].[ActiveHistory]
(
[ID] [INT] IDENTITY(1,1) NOT NULL,
[Date] [VARCHAR](250) NOT NULL,
[ActiveID] [INT] NOT NULL,
[UserID] [INT] NOT NULL,
CONSTRAINT [PK_ActiveHistory]
PRIMARY KEY CLUSTERED ([ID] ASC)
)
About 600,000 rows are inserted into the table per day that means 300,000 distinct actives for one date with about 500 distinct users. I would like to have about 5 year history in one table that means more then bln rows, in overall about 4,000 distinct userid and 1,000,000 distinct actives are placed in 5 year table. it is very important for me to work faster with this table,
Most of the queries in the past used joins with date and userid but in last days I have to include activeid quite often, but sometimes just two of them could be used (any pairs).
I never use ID in join.
Now I have nonclustered index with userid and date as index key columns and ID and ActiveID as included columns, Now my question is - how to best arrange the index for this table considering new challenges, just add all options as index may use huge place and sometimes application that uses the same server is suffering as CPU usage goes to 99%, I am not sure how new indexes will effect on that.

Query for finding the kid with the highest grade level in a school that is working on a project?

There's a school project that is given to kids in various grade level(only one senior though). Sophomores will mentor Freshmen, Juniors will mentor sophomores, and one senior will mentor juniors.
I made the following tables
CREATE TABLE [dbo].[School](
[Name] [varchar](50) NOT NULL,
[studentID] [varchar](50) primary key NOT NULL,
[MentorID] [varchar](50),
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[PROJECT](
[PID] [int] primary key NOT NULL,
[ProjectName] [varchar](30) NOT NULL,
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[WorksOn](
[SID] [varchar](50) FOREIGN KEY REFERENCES School(studentID) NOT NULL,
[ProjID] [int] FOREIGN KEY REFERENCES PROJECT(PID) NOT NULL,
primary key(SID,PID),
) ON [PRIMARY]
GO
I'm trying to write a query that can retrieve the name of the highest level person working on a project. So project1 can have a freshman, 2 sophomores, and a junior. It will return the name of that junior.
I'm wondering if I should add [GradeLevel] to the school table? Would it make it easier?
Also, let's say project2 has one freshman and two sophomores. It should return the name of the two sophomores.
edit- query I'm trying
SELECT p.ProjectName AS "Project", s.Name AS "Highest ranking memeber/s"
FROM PROJECT p
inner JOIN School s
ON p.PID = s.ProjectID
where min(s.Level)-- level 1 being a senior, 2 a junior.
group by PROJECT.ProjectName
if Project1 has bob(freshmen),jule(sophmore), betty(junior), then it should return betty for project1.
You have a flaw in your current design.
You currently have two different tables linking the studentId to the MentorId (btw, varchar(50) is probably not the best choice for an id column, but that's a different story).
If you want to keep track of who is mentoring who, you can either keep that relationship in the School table, or in the WorksOn table (poor name, IMHO. a name like MentorToStudent would convey the meaning of this table much better).
If all you care about is the students grade level, but you don't care about which particular Junior is mentoring which particular sophomore, then instead of having the MentorID column you should remove the WorksOn table and simply add the GradeLevel column to the School table.
So based on these choices, the query to get whoever have the highest grade level for a project will change - the first option would probably require a recursive cte, while the second option would only require to know how to order the grade levels (which is easy, if you have them in their own table with a name and id - and that's probably going to be a good idea in both options).

SQL Server - Order Identity Fields in Table

I have a table with this structure:
CREATE TABLE [dbo].[cl](
[ID] [int] IDENTITY(1,1) NOT NULL,
[NIF] [numeric](9, 0) NOT NULL,
[Name] [varchar](80) NOT NULL,
[Address] [varchar](100) NULL,
[City] [varchar](40) NULL,
[State] [varchar](30) NULL,
[Country] [varchar](25) NULL,
Primary Key([ID],[NIF])
);
Imagine that this table has 3 records. Record 1, 2, 3...
When ever I delete Record number 2 the IDENTITY Field generates a Gap. The table then has Record 1 and Record 3. Its not correct!
Even if I use:
DBCC CHECKIDENT('cl', RESEED, 0)
It does not solve my problem becuase it will set the ID of the next inserted record to 1. And that's not correct either because the table will then have a multiple ID.
Does anyone has a clue about this?
No database is going to reseed or recalculate an auto-incremented field/identity to use values in between ids as in your example. This is impractical on many levels, but some examples may be:
Integrity - since a re-used id could mean records in other systems are referring to an old value when the new value is saved
Performance - trying to find the lowest gap for each value inserted
In MySQL, this is not really happening either (at least in InnoDB or MyISAM - are you using something different?). In InnoDB, the behavior is identical to SQL Server where the counter is managed outside of the table, so deleted values or rolled back transactions leave gaps between last value and next insert. In MyISAM, the value is calculated at time of insertion instead of managed through an external counter. This calculation is what is giving the perception of being recalcated - it's just never calculated until actually needed (MAX(Id) + 1). Even this won't insert inside gaps (like the id = 2 in your example).
Many people will argue if you need to use these gaps, then there is something that could be improved in your data model. You shouldn't ever need to worry about these gaps.
If you insist on using those gaps, your fastest method would be to log deletes in a separate table, then use an INSTEAD OF INSERT trigger to perform the inserts with your intended keys by first looking for records in these deletions table to re-use (then deleting them to prevent re-use) and then using the MAX(Id) + 1 for any additional rows to insert.
I guess what you want is something like this:
create table dbo.cl
(
SurrogateKey int identity(1, 1)
primary key
not null,
ID int not null,
NIF numeric(9, 0) not null,
Name varchar(80) not null,
Address varchar(100) null,
City varchar(40) null,
State varchar(30) null,
Country varchar(25) null,
unique (ID, NIF)
)
go
I added a surrogate key so you'll have the best of both worlds. Now you just need a trigger on the table to "adjust" the ID whenever some prior ID gets deleted:
create trigger tr_on_cl_for_auto_increment on dbo.cl
after delete, update
as
begin
update dbo.cl
set ID = d.New_ID
from dbo.cl as c
inner join (
select c2.SurrogateKey,
row_number() over (order by c2.SurrogateKey asc) as New_ID
from dbo.cl as c2
) as d
on c.SurrogateKey = d.SurrogateKey
end
go
Of course this solution also implies that you'll have to ensure (whenever you insert a new record) that you check for yourself which ID to insert next.

Composite increment column

I have a situation where I need to have a secondary column be incremented by 1, assuming the value of another is the same.
Table schema:
CREATE TABLE [APP].[World]
(
[UID] [uniqueidentifier] ROWGUIDCOL NOT NULL,
[App_ID] [bigint] NOT NULL,
[id] [bigint] NOT NULL,
[name] [varchar](255) NOT NULL,
[descript] [varchar](max) NULL,
[default_tile] [uniqueidentifier] NOT NULL,
[active] [bit] NOT NULL,
[inactive_date] [datetime] NULL
)
First off, I have UID which is wholly unique, no matter what App_ID is.
In my situation, I would like to have id be similar to Increment(1,1), only for the same App_ID.
Assumptions:
There are 3 App_Id: 1, 2, 3
Scenario:
App_ID 1 has 3 worlds
App_ID 2 has 5 worlds
App_ID 3 has 1 world
Ideal outcome:
App_ID id
1 1
2 1
3 1
1 2
2 2
1 3
2 3
2 4
2 5
Was thinking of placing the increment logic in the Insert stored procedure but wanted to see if there would be an easier or different way of producing the same result without a stored procedure.
Figure the available option(s) are triggers or stored procedure implementation but wanted to make sure there wasn't some edge-case pattern I am missing.
Update #1
Lets rethink this a little.
This is about there being a PK UID and ultimately a Partitioned Column id, over App_ID, that is incremented by 1 with each new entry for the associated App_id.
This would be similar to how you would do Row_Number() but without all the overhead of recalculating the value each time a new entry is inserted.
As well App_ID and id both have the space and potential for being BIGINT; therefore the combination number of possible combinations would be: BIGINT x BIGINT
This is not possible to implement the way you are asking for. As others have pointed out in comments to your original post, your database design would be a lot better of split up in multiple tables, which all have their own identities and utilizes foreign key constraint where necessary.
However, if you are dead set on proceeding with this approach, I would make app_id an identity column and then increment the id column by first querying it for
MAX(identity)
and then increment the response by 1. This kind of logic is suitable to implement in a stored procedure, which you should implement for inserts anyway to prevent from direct sql injections and such. The query part of such a procedure could look like this:
INSERT INTO
[db].dbo.[yourtable]
SET
(
app_id
, id
)
VALUES
(
#app_id
, (
SELECT
MAX(id)
FROM
[db].dbo.[table]
WHERE
App_id = #app_id
)
)
The performance impact for doing so however, is up to you to assess.
Also, you need to consider how to properly handle when there is no previous rows for that app_id.
Simplest Solution will be as below :
/* Adding Leading 0 to [App_ID] */
[SELECT RIGHT(CONCAT('0000', (([App_ID] - (([App_ID] - 1) % 3)) / 3) + 1), 4) AS [App_ID]
I did the similar thing in my recent code, please find the below image.
Hope the below example will help you.
Explanation part - In the below Code, used the MAX(Primary_Key Identity column) and handled first entry case with the help of ISNULL(NULL,1). In All other cases, it will add up 1 and gives unique value. Based on requirements and needs, we can made changes and use the below example code. WHILE Loop is just added to show demo(Not needed actually).
IF OBJECT_ID('dbo.Sample','U') IS NOT NULL
DROP TABLE dbo.Sample
CREATE TABLE [dbo].[Sample](
[Sample_key] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
[Student_Key] [int] UNIQUE NOT NULL,
[Notes] [varchar](100) NULL,
[Inserted_dte] [datetime] NOT NULL
)
DECLARE #A INT,#N INT
SET #A=1
SET #N=10
WHILE(#A<=#N)
BEGIN
INSERT INTO [dbo].[Sample]([Student_Key],[Notes],[Inserted_dte])
SELECT ISNULL((MAX([Student_Key])+1),1),'NOTES',GETDATE() FROM [dbo].[Sample]
SET #A+=1
END
SELECT * FROM [dbo].[Sample]

SQL design for various data types

I need to store data in a SQL Server 2008 database from various data sources with different data types. Data types allowed are: Bit, Numeric (1, 2 or 4 bytes), Real and String. There is going to be a value, a timestamp, a FK to the item of which the value belongs and some other information for the data stored.
The most important points are the read performance and the size of the data. There might be a couple thousand items and each item may have millions of values.
I have 5 possible options:
Separate tables for each data type (ValueBit, ValueTinyInt, ValueSmallInt, etc... tables)
Separate tables with inheritance (Value table as base table, ValueBit table just for storing the Bit value, etc...)
Single value table for all data types, with separate fields for each data type (Value table, with ValueBit BIT, ValueTinyInt TINYINT etc...)
Single table and single value field using sql_variant
Single table and single value field using UDT
With case 2, a PK is a must, and,
1000 item * 10 000 000 data each > Int32.Max, and,
1000 item * 10 000 000 data each * 8 byte BigInt PK is huge
Other than that, I am considering 1 or 3 with no PK. Will they differ in size?
I do not have experience with 4 or 5 and I do not think that they will perform well in this scenario.
Which way shall I go?
Your question is hard to answer as you seem to use a relational database system for something it is not designed for. The data you want to keep in the database seems to be too unstructured for getting much benefit from a relational database system. Database designs with mostly fields like "parameter type" and "parameter value" that try to cover very generic situations are mostly considered to be bad designs. Maybe you should consider using a "non relational database" like BigTable. If you really want to use a relational database system, I'd strongly recommend to read Beginning Database Design by Clare Churcher. It's an easy read, but gets you on the right track with respect to RDBS.
What are usage scenarios? Start with samples of queries and calculate necessary indexes.
Consider data partitioning as mentioned before. Try to understand your data / relations more. I believe the decision should be based on business meaning/usages of the data.
I think it's a great question - This situation is fairly common, though it is awkward to make tables to support it.
In terms of performance, having a table like indicated in #3 potentially wastes a huge amount of storage and RAM because for each row you allocate space for a value of every type, but only use one. If you use the new sparse table feature of 2008 it could help, but there are other issues too: it's a little hard to constrain/normalize, because you want only only one of the multiple values to be populated for each row - having two values in two columns would be an error, but the design doesn't reflect that. I'd cross that off.
So, if it were me I'd be looking at option 1 or 2 or 4, and the decision would be driven by this: do I typically need to make one query returning rows that have a mix of values of different types in the same result set? Or will I almost always ask for the rows by item and by type. I ask because if the values are different types it implies to me some difference in the source or the use of that data (you are unlikely, for example, to compare a string and a real, or a string and a bit.) This is relevant because having different tables per type might actually be a significant performance/scalability advantage, if partitioning the data that way makes queries faster. Partitioning data into smaller sets of more closely related data can give a performance advantage.
It's like having all the data in one massive (albeit sorted) set or having it partitioned into smaller, related sets. The smaller sets favor some types of queries, and if those are the queries you will need, it's a win.
Details:
CREATE TABLE [dbo].[items](
[itemid] [int] IDENTITY(1,1) NOT NULL,
[item] [varchar](100) NOT NULL,
CONSTRAINT [PK_items] PRIMARY KEY CLUSTERED
(
[itemid] ASC
)
)
/* This table has the problem of allowing two values
in the same row, plus allocates but does not use a
lot of space in memory and on disk (bad): */
CREATE TABLE [dbo].[vals](
[itemid] [int] NOT NULL,
[datestamp] [datetime] NOT NULL,
[valueBit] [bit] NULL,
[valueNumericA] [numeric](2, 0) NULL,
[valueNumericB] [numeric](8, 2) NULL,
[valueReal] [real] NULL,
[valueString] [varchar](100) NULL,
CONSTRAINT [PK_vals] PRIMARY KEY CLUSTERED
(
[itemid] ASC,
[datestamp] ASC
)
)
ALTER TABLE [dbo].[vals] WITH CHECK
ADD CONSTRAINT [FK_vals_items] FOREIGN KEY([itemid])
REFERENCES [dbo].[items] ([itemid])
GO
ALTER TABLE [dbo].[vals] CHECK CONSTRAINT [FK_vals_items]
GO
/* This is probably better, though casting is required
all the time. If you search with the variant as criteria,
that could get dicey as you have to be careful with types,
casting and indexing. Also everything is "mixed" in one
giant set */
CREATE TABLE [dbo].[allvals](
[itemid] [int] NOT NULL,
[datestamp] [datetime] NOT NULL,
[value] [sql_variant] NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[allvals] WITH CHECK
ADD CONSTRAINT [FK_allvals_items] FOREIGN KEY([itemid])
REFERENCES [dbo].[items] ([itemid])
GO
ALTER TABLE [dbo].[allvals] CHECK CONSTRAINT [FK_allvals_items]
GO
/* This would be an alternative, but you trade multiple
queries and joins for the casting issue. OTOH the implied
partitioning might be an advantage */
CREATE TABLE [dbo].[valsBits](
[itemid] [int] NOT NULL,
[datestamp] [datetime] NOT NULL,
[val] [bit] NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[valsBits] WITH CHECK
ADD CONSTRAINT [FK_valsBits_items] FOREIGN KEY([itemid])
REFERENCES [dbo].[items] ([itemid])
GO
ALTER TABLE [dbo].[valsBits] CHECK CONSTRAINT [FK_valsBits_items]
GO
CREATE TABLE [dbo].[valsNumericA](
[itemid] [int] NOT NULL,
[datestamp] [datetime] NOT NULL,
[val] numeric( 2, 0 ) NOT NULL
) ON [PRIMARY]
GO
... FK constraint ...
CREATE TABLE [dbo].[valsNumericB](
[itemid] [int] NOT NULL,
[datestamp] [datetime] NOT NULL,
[val] numeric ( 8, 2 ) NOT NULL
) ON [PRIMARY]
GO
... FK constraint ...
etc...

Resources