Complex refactor and version control with Database Projects - sql-server

Let's say I have a table like so:
CREATE TABLE Foo
(
Id INT IDENTITY NOT NULL PRIMARY KEY,
Data VARCHAR(10) NOT NULL,
TimeStamp DATETIME NOT NULL DEFAULT GETUTCDATE()
);
Now let's say I build this in a SQL Server Database Project, and I publish this in version 1.0 of my application. The application is deployed, and the table is used as expected.
For the 1.1 release, the product owners decide they want to track the source of the data, and this will be a required column going forward. For the data that already exists in the database, if the Data column is numeric, they want the Source to be 'NUMBER'. If not, it should be 'UNKNOWN'.
The table in the the database project now looks like this:
CREATE TABLE Foo
(
Id INT IDENTITY NOT NULL PRIMARY KEY,
Data VARCHAR(10) NOT NULL,
Source VARCHAR(10) NOT NULL,
TimeStamp DATETIME NOT NULL DEFAULT GETUTCDATE(),
);
This builds fine, but deploying an upgrade would be a problem. This would break if data exists in the table. The generated script will create a temporary table, move data from the old table into the temp one, drop the old table, and rename the temp table to the original name... but it won't if there's data in that table, because it would fail to assign values to the non-nullable column Source.
For trivial refactors, the refactor log tracks changes in the schema, and maintains awareness of the modified database objects, but there doesn't seem to be a way to do this when you get your hands a little dirty.
How can the Database Project be leveraged to replace the default script for this change with a custom one that properly captures the upgrade logic? There must be some way to address this issue.

Related

SSIS - Auto increment field is not inserted correctly with data flow task

I am trying to copy data from one database to another using ssis. I created the dtsx package with the SQL Server Import and Export Wizard.
The table I am copying from has a column name "Id", the other table has name "ModuleCategoryId", which I mapped together.
ModuleCategoryId is the identity, and has an auto increment of 1.
In the source database, the Id's are not ordered, and go like this:
32 Name1
14 Name2
7 Name3
After executing the data flow, the destination DB looks like this:
1 Name1
2 Name2
3 Name3
I have enabled identity insert in the wizard during the, but this doesn't do anything.
The destination database was made with Entity Framework, code first.
If I explicitly turn off ValueGeneratedOnAdd, and remake the destination database, the data is being transferred correctly, but I was wondering if there's a way to transfer all the data without turning off the auto increment, and then turning it back on.
If I manually set Identity Insert on for that table, I can insert rows with whatever ModuleCategoryId I want, so it must be something with the dataflow.
Table definitions are table definitions - regardless of the syntactic sugar ORM tools might overlay.
I created a source and destination table and populated the source to match your supplied data. I do define the identity property on the destination table as well. Whether that's what a ValueGeneratedOnAdd is implemented as in the API, I don't know but it almost has to be otherwise the Enable Identity Insert should fail (if the UI even allows it).
The IDENTITY property allows you to seed it with any initial value you want. For the taget table, I seed at the minimum value allowed for a signed integer so that if the identity insert doesn't work, the resulting values will look really "wrong"
DROP TABLE IF EXISTS dbo.SO_67370325_Source;
DROP TABLE IF EXISTS dbo.SO_67370325_Destination;
CREATE TABLE dbo.SO_67370325_Source
(
Id int IDENTITY(1,1) NOT NULL
, Name varchar(50)
);
CREATE TABLE dbo.SO_67370325_Destination
(
ModuleCategoryId int IDENTITY(-2147483648,1) NOT NULL
, Name varchar(50)
);
CREATE TABLE dbo.SO_67370325_Destination_noident
(
ModuleCategoryId int NOT NULL
, Name varchar(50)
);
SET IDENTITY_INSERT dbo.SO_67370325_Source ON;
INSERT INTO DBO.SO_67370325_Source
(
Id
, Name
)
VALUES
(32, 'Name1')
, (14, 'Name2')
, (7, 'Name3');
SET IDENTITY_INSERT dbo.SO_67370325_Source OFF;
INSERT INTO dbo.SO_67370325_Source
(
Name
)
OUTPUT Inserted.*
VALUES
(
'Inserted naturally' -- Name - varchar(50)
);
Beyond your 3 supplied values, I added a fourth and if you run the supplied query, you'll see the generated ID is likely 33. Source table is created with an identity seeded at 1 but the explicit identity inserts on the source table advance the seed value to 32. Assuming no other activity occurs, next value would be 33 since our increment is 1.
All that said, I have 3 scenarios established. In the Import Export wizard, I checked the Identity Insert and mapped Id to ModuleCategoryId and ran the package.
ModuleCategoryId|Name
32|Name1
14|Name2
7|Name3
33|Inserted naturally
The data in the target table is identical to the source - as expected. At this point, the identity seed is sitting at 33 which I could verify with some DBCC check command I don't have handy.
The next case is taking the same package and unchecking the Identity Insert property. This becomes invalid as I'd get an error reporting
Failure inserting into the read-only column "ModuleCategoryId"
The only option is to unmap the Id to ModuleCategoryId. Assuming I loaded to the same table as before, I would see data something like this
ModuleCategoryId|Name
34|Name1
35|Name2
36|Name3
37|Inserted naturally
If I had never put a record into this table, then I'd get results like
ModuleCategoryId|Name
-2147483648|Name1
-2147483647|Name2
-2147483646|Name3
-2147483645|Inserted naturally
WITHOUT AN EXPLICIT ORDER BY ON MY SOURCE, THERE IS NO GUARANTEE OF RESULTS ORDERING. I fight this battle often. The SQL Engine has no obligation to return data in the primary key order or any other such order unless you explicitly ask for it. Had the following results been stored, it would be equally valid.
ModuleCategoryId|Name
34|Inserted naturally
35|Name1
36|Name2
37|Name3
If you have a requirement for data to be inserted into the target table based on the ascending values of Id in the source table, in the Import/Export wizard, you need to go to the screen where it asks whether you want to pick tables or write a query and choose the second option of query. Then you will write SELECT * FROM dbo.SO_67370325_Source ORDER BY Id; or whatever your source table is named.
The final test, loading SO_67370325_Destination_noident, demonstrates a table with no identity property defined. If I do not map Id to ModuleCategoryId, the package will fail as the column is defined as NOT NULL. When I map the Id to ModuleCategoryId, I will see the same results as the first (7,14,32,33) BUT, every subsequent insert to the target table will have to provide their own Id which may or may not align with what your FluentAPI stuff does.
Similar question/answer Error 0xc0202049: Data Flow Task 1: Failure inserting into the read-only column

Maintaining an audit trail on specific entitities (tables) in SQL Server, using triggers (and a stored proc)

I want to maintain an audit trail for some tables in a SQL Server database. For these tables, I want to record all CRUD activities on the entity (object) stored in the table. This is the table schema I have come up with so far. For updates to an entity, please note that I am storing detailed data showing the fields that were changed, in a sub (related) table. To be able to store different values in the same column, I am storing the text representations of the data type of the field.
I would want some feedback on:
The table design (can it be improved?, are there any gotchas to be aware of with this design)?
I would like to write a trigger (per monitored table), so that I can provide an audit trail for CRUD operations on the monitored entity. I am new to triggers, so an example of how I can use a trigger to record CRUD operations on entity Foo would be very useful. Being the lazy bugger that I am, I want to keep the SQL DRY, so I would like all of the triggers to call a common stored proc that takes the (what,why, when) as parameters.
Code:
CREATE TABLE Foo (
id INTEGER NOT NULL PRIMARY KEY,
field_1 REAL,
field_2 VARCHAR(256) NOT NULL,
field_3 BIT,
field_4 IMAGE
field_5 VARCHAR(256), -- Path to an image file
field_6 DATE,
field_7 TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
field_8 INTEGER REFERENCES Personnel(id) ON UPDATE CASCADE ON DELETE NO ACTION
);
-- store CRUD actions info (who, what, when, why) on specific tables
CREATE TABLE GenericRecordArchive (
id INTEGER NOT NULL PRIMARY KEY,
personnel_id INTEGER REFERENCES Personnel(id) ON UPDATE CASCADE ON DELETE NO ACTION, -- who
monitored_object_id INTEGER REFERENCES MonitoredObject(id) ON UPDATE CASCADE ON DELETE NO ACTION, -- what ...
crud_type_id INTEGER REFERENCES CrudType(id) ON UPDATE CASCADE ON DELETE NO ACTION,
reason TEXT NOT NULL, -- why
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP -- when
);
-- Detailed archive info for updates, stored changed fields (for updates only)
CREATE TABLE GenericRecordArchiveInfo (
id INTEGER NOT NULL PRIMARY KEY,
archive_record_id INTEGER REFERENCES GenericRecordArchive(id) ON UPDATE CASCADE ON DELETE NO ACTION,
field_name VARCHAR(128) NOT NULL,
old_value TEXT,
new_value TEXT,
created_at TIMESTAMP
);
Notes:
For the sake of simplicity, I have not included indices etc in the schema above
Although I am working on SQL Server, I may want to port this to some other database in the future, so wherever possible, I would like the SQL to be as db agnostic as possible (unless there are huge efficiency gains, parsimony of code etc. by keeping the SQL code SQL Server centric).

Correct SQL to convert mySQL tables to SQL Server tables

I have a number of tables I need to convert from mySQL to SQL Server.
An Example of a mySQL Table is
CREATE TABLE `required_items` (
`id` INT( 11 ) NOT NULL AUTO_INCREMENT PRIMARY KEY COMMENT 'Unique Barcode ID',
`fk_load_id` INT( 11 ) NOT NULL COMMENT 'Load ID',
`barcode` VARCHAR( 255 ) NOT NULL COMMENT 'Barcode Value',
`description` VARCHAR( 255 ) NULL DEFAULT NULL COMMENT 'Barcode Description',
`created` TIMESTAMP NULL DEFAULT NULL COMMENT 'Creation Timestamp',
`modified` TIMESTAMP ON UPDATE CURRENT_TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Modified Timestamp',
FOREIGN KEY (`fk_load_id`) REFERENCES `loads`(`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE = InnoDB CHARACTER SET ascii COLLATE ascii_general_ci COMMENT = 'Contains Required Items for the Load';
And a trigger to update the created date
CREATE TRIGGER required_items_before_insert_created_date BEFORE INSERT ON `required_items`
FOR EACH ROW
BEGIN
SET NEW.created = CURRENT_TIMESTAMP;
END
Now I need to create tables similar to this in SQL Server. There seems to be a lot of different data types available so I am unsure which to use.
What data type should I use to the primary key column
(uniqueidentifier, bigint, int)?
What should I use for the timestamps
(timestamp, datatime, datetime2(7))?
How should I enforce the created
and modified timestamps (currently I am using triggers)?
How can I enforce foreign key constraints.
Should I be using Varchar(255) in SQL Server? (Maybe Text, Varchar(MAX) is better)
I am using Visual Studio 2010 to create the tables.
First of all, you can probably use PHPMyAdmin (or something similar) to script out the table creation process to SQL Server. You can take a look at what is automatically created for you to get an idea of what you should be using. After that, you should take a look at SSMS (SQL Server Management Studio) over Visual Studio 2010. Tweaking the tables that your script will create will be easier in SSMS - in fact, most database development tasks will be easier in SSMS.
What data type should I use to the primary key column (uniqueidentifier, bigint, int)?
Depending on how many records you plan to have in your table, use int, or bigint. There are problems with uniqueidentfiers that you will probably want to avoid. INT vs Unique-Identifier for ID field in database
What should I use for the timestamps (timestamp, datatime, datetime2(7))?
timestamps are different in SQL Server than in MySQL. Despite the name, a timestamp is an incrementing number that is used as a mechanism to version rows. http://msdn.microsoft.com/en-us/library/ms182776%28v=sql.90%29.aspx . In short though, datetime is probably your best bet for compatibility purposes.
How should I enforce the created and modified timestamps (currently I am using triggers)?
See above. Also, the SQL Server version of a "Timestamp" is automatically updated by the DBMS. If you need a timestamp similar to your MySQL version, you can use a trigger to do that (but that is generally frowned upon...kind of dogmatic really).
How can I enforce foreign key constraints.
You should treat them as you would using innoDB. See this article for examples of creating foreign key constraints http://blog.sqlauthority.com/2008/09/08/sql-server-%E2%80%93-2008-creating-primary-key-foreign-key-and-default-constraint/
Should I be using Varchar(255) in SQL Server? (Maybe Text, Varchar(MAX) is better)
That depends on the data you plan to store in the field. Varchar max is equivalent to varchar(8000) and if you don't need varchar(255), you can always set it to a lower value like varchar(50). Using a field size that is too large has performance implications. One thing to note is that if you plan to support unicode (multilingual) data in your field, use nvarchar or nchar.

Large sample database for HSQLDB?

I'm taking a database class and I'd like to have a large sample database to experiment with. My definition of large here is that there's enough data in the database so that if I try a query that's very inefficient, I'll be able to tell by the amount of time it takes to execute. I've googled for this and not found anything that's HSQLDB specific, but maybe I'm using the wrong keywords. Basically I'm hoping to find something that's already set up, with the tables, primary keys, etc. and normalized and all that, so I can try things out on a somewhat realistic database. For HSQLDB I guess that would just be the .script file. Anyway if anybody knows of any resources for this I'd really appreciate it.
You can use the MySQL Sakila database schema and data (open source, on MySQL web site), but you need to modify the schema definition. You can delete the view and trigger definitions, which are not necessary for your experiment. For example:
CREATE TABLE country (
country_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
country VARCHAR(50) NOT NULL,
last_update TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (country_id)
)ENGINE=InnoDB DEFAULT CHARSET=utf8;
modified:
CREATE TABLE country (
country_id SMALLINT GENERATED BY DEFAULT AS IDENTITY,
country VARCHAR(50) NOT NULL,
last_update TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (country_id)
)
Some MySQL DDL syntax is supported in the MYS syntax mode of HSQLDB, for example AUTO_INCREMENT is translated to IDENTITY, but others need manual editing. The data is mostly compatible, apart from some binary strings.
You need to access the database with a tool that reports the query time. The HSQLDB DatabaseManager does this when the query output is in Text mode.

how to resolve circular reference in Django model?

I'm converting an old app from ASP.NET + MSSQL to Django + Postgres. The existing design looks like this:
create table foo
( id integer
, name varchar(20)
, current_status_id integer null
)
create table foo_status
( id integer
, foo_id integer
, status_date datetime
, status_description varchar(100)
)
So each foo has a multiple foo_history records, but there is a denormalized field, current_status_id, that points to the last history record.
To convert the data, I just defined foo.current_status_id as an IntegerField, not as a ForeignKey, because Postgres would (correctly) gripe about missing foreign keys no matter which table I loaded first.
Now that I've converted the data, I'd like to have all the foreign-key goodness again, for things like querying. Is there a good way to handle this besides changing the model from IntegerField before I do a syncdb to ForeignKey afterward?
A few points about how django works:
./manage.py syncdb does not modify existing tables. You can modify your model fields and run syncdb, but your db will stay intact. If you need do need this functionality, use south.
When creating a new instance x with a myfk ForeignKey field, and setting it's x.myfk_id by assigning it an integer and x.save()ing it, the constraint is checked only on the db level: Django will not throw an exception if the referenced records are missing. Therefore, you can first create the tables without the constraints (either by using IntegerFields+syncdb as you suggested, or carefully running a modified .manage.py sqlall version of the ForeignKeys version), load your data, and then ALTER TABLE your db manually.

Resources