Do you know what's the syntax to add a column with a default constraint of CURRENT_TIMEStAMP() on Snowflake?
I was trying:
ALTER TABLE clients
ADD COLUMN inserted_at TIMESTAMP_NTZ DEFAULT CAST(CURRENT_TIMESTAMP AS TIMESTAMP_NTZ);
But it shows me this error:
SQL compilation error: Invalid column default expression [CAST(CURRENT_TIMESTAMP() AS TIMESTAMP_NTZ(9))]
Thank you for your help!
As Mike noted, the answers is, "you can not do that", which is the error is say.
What the effect of the command would be, if it "worked for free", would be the same is making a new table, inserting the CURRENT_TIMESTMAP for all rows. Which is what happens when you insert a new row, with the value missing. This raises an interesting question, of "what values make sense for the historically inserted rows" and this is in part what the free lunch is missing, because you have to decide what that would mean, because if you had a "years word of data" and inserted_at is going to be used to do some neat aggregation, or other "value enriching", what does it mean have this lump of "one years of values that all have the value now".
Given this is your data, you need to know what what to make of this lump, or how to correctly back-fill that data.
This in general is the nature of "missing" free-lunch features, ether they do not scale to large data for free, which implies as the owner of the data you are in the best position to write DML's the work best for your data, OR the meaning of the data cannot be autogenerated in a way the is "always correct" thus the edge cases of that data, need to be understood by the users.
Related
Let's say I do a query against my context to retrieve a particular entity.
Now I'd like to find the best way to know if the corresponding row in the database (let's stay simple with a 1-1 mapping between entity and SQL Table) has changed since the creation of my entity.
I've thought about using a TimeStamp column and execute a simple query each each I want to know if the entity is outdated, like:
var uptodate = (from e in context.mySet
where e.TimeStamp == entityTimeStamp
select e).Any();
By indexing the TimeStamp column, I think it would be a fast way to go, but unfortunately I didn't find any confirmation around the internet...
If you're looking for a way to do this without having to modify your table, you could use the CHECKSUM command and add an additional column to your mapping (as a view presumably). This way you don't have to worry about adding a new column to the database.
Something like this should work :
select *,
BINARY_CHECKSUM(*) as CheckSumValue
from test WITH (NOLOCK);
Here is some sample fiddle.
With that said, sometimes it's preferable to have a ModifiedDate field in your table. If that is the case, then that would surely be your best way of checking for changes.
Good luck.
If optimistic locking is OK, How about adding the [ConcurrencyCheck] attribute to the field in your data class. This causes the update to fail if the record has been modified.
The attribute is from System.ComponentModel.DataAnnotations
After further tests and investigation using the TimeStamp and a query to know if it's still the same (more precisely if the given value is still in the table) seems the best way to go.
Such mechanism needs an index on the TimeStamp column to ensure the best performance (roughly O(Log(n) instead of (O(n)).
I'm working in Visual Studio - VB.NET.
My problem is that I want to delete a specific row in SQL Server but the only unique column I have is an Identity that increments automatically.
My process of work:
1. I add a row in the column (the identity is being incremented, but I don't know the number)
2. I want to delete the previous row
Is there a sort of unique ID that every new record has?
It's possible that my table has 2 exactly the same records, just the sequence (identity) is different.
Any ideas how to handle this problem?
SQL Server has a few functions that return the generated ID for the last rows, each with it's own specific strengths and weaknesses.
Basically:
##IDENTITY works if you do not use triggers
SCOPE_IDENTITY() works for the code you explicitly called.
IDENT_CURRENT(‘tablename’) works for a specific table, across all scopes.
In almost all scenarios SCOPE_IDENTITY() is what you need, and it's a good habit to use it, opposed to the other options.
A good discussion on the pros and cons of the approaches is also available here.
I want to delete the previous row
And that is your problem. There is no such concept in SQL as a 'previous row'. The word previous implies order and order applies only to queries, where is achieved by adding an ORDER BY clause. Tables have no order. You need to rephrase this in terms of "I need to delete the record that satisfies <this> condition.". This may sound to you like pedantic gibberish, but you will never find a solution until you acknowledged the problem.
Searching for a way to interpret the value of the inserted identity column and then subtracting 1 from it is flawed with many many many problems. It is incorrect under concurrency. It is incorrect in presence of rollbacks. It is incorrect after ETL jobs. Overall, never expect monotonically increasing identities, they're free to jump gaps and your code should be correct in presence of gaps.
I have a big fat query that's written dynamically to integrate some data. Basically what it does is query some tables, join some other ones, treat some data, and then insert it into a final table.
The problem is that there's too much data, and we can't really trust the sources, because there could be some errored or inconsistent data.
For example, I've spent almost an hour looking for an error while developing using a customer's database because somewhere in the middle of my big fat query there was an error converting some varchar to datetime. It turned out to be that they had some sales dating '2009-02-29', an out-of-range date.
And yes, I know. Why was that stored as varchar? Well, the source database has 3 columns for dates, 'Month', 'Day' and 'Year'. I have no idea why it's like that, but still, it is.
But how the hell would I treat that, if the source is not trustable?
I can't HANDLE exceptions, I really need that it comes up to another level with the original message, but I wanted to provide some more info, so that the user could at least try to solve it before calling us.
So I thought about displaying to the user the row number, or some ID that would at least give him some idea of what record he'd have to correct. That's also a hard job because there will be times when the integration will run up to 80000 records.
And in an 80000 records integration, a single dummy error message: 'The conversion of a varchar data type to a datetime data type resulted in an out-of-range datetime value' means nothing at all.
So any idea would be appreciated.
Oh I'm using SQL Server 2005 with Service Pack 3.
EDIT:
Ok, so for what I've read as answers, best thing to do is check each column that could be critical to raising errors, and if they do attend the condition, I should myself raise an error, with the message I find more descriptive, and add some info that could have been stored in a separate table or some variables, for example the ID of the row, or some other root information.
for dates you can use the isdate function
select ISDATE('20090229'),ISDATE('20090227')
I usually insert into a staging table, do my checks and then insert into the real tables
My suggestion would be to pre-validate the incoming data, and as you encounter errors, set aside the record. For example, check for invalid dates. Say you find 20 in a set of 80K. Pull those 20 out into a separate table, with the error message attached to the record. Run your other validation, then finally import the remaining (all valid) records into the desired target table(s).
This might have too much impact on performance, but would allow you to easily point out the errors and allow them to be corrected and then inserted in a second pass.
This sounds like a standard ETL issue: Extract, Transform, and Load. (Unless you have to run this query over and over again against the same set of data, in which case you'd pretty much do the same thing, only over and over again. So how critical is performance?)
What kind of error handling and/or "reporting of bad data" are you allowed to provide? If you have everything as "one big fat query", your options become very limited -- either the query works or it doesn't, and if it doesn't I'm guessing you get at best one RAISERROR message to tell the caller what's what.
In a situation like this, the general framework I'd try to set up is:
Starting with the source table(s)
Produce an interim set of tables (SQLMenace's staging tables) that you know are consistant and properly formed (valid data, keys, etc.)
Write the "not quite so big and fat query" against those tables
Done this way, you should always be able to return (or store) a valid data set... even if it is empty. The trick will be in determining when the routine fails -- when is the data too corrupt to process and produce the desired results, so you return a properly worded error message instead?
try something like this to find the rows:
...big fat query here...
WHERE ISDATE(YourBadVarcharColumn)!=1
Load the Data into a staging table, where most columns are varchar and allow NULLs, where you have a status column.
Run an UPDATE command like
UPDATE Staging
SET Status='X'
WHERE ISDATE(CONVERT(YourCharYear+YourCharMonth+YourCharDat+)!=1
OR OtherColumn<4...
Then just insert from your staging table where Status!='X'
INSERT INTO RealTable
(col1, col2...)
SELECT
col1, col2, ...
where Status!='X'
A few months back, I started using a CRUD script generator for SQL Server. The default insert statement that this generator produces, SELECTs the inserted row at the end of the stored procedure. It does the same for the UPDATE too.
The previous way (and the only other way I have seen online) is to just return the newly inserted Id back to the business object, and then have the business object update the Id of the record.
Having an extra SELECT is obviously an additional database call, and more data is being returned to the application. However, it allows additional flexibility within the stored procedure, and allows the application to reflect the actual data in the table.
The additional SELECT also increases the complexity when wanting to wrap the insert/update statements in a transaction.
I am wondering what people think is better way to do it, and I don't mean the implementation of either method. Just which is better, return just the Id, or return the whole row?
We always return the whole row on both an Insert and Update. We always want to make sure our client apps have a fresh copy of the row that was just inserted or updated. Since triggers and other processes might modify values in columns outside of the actual insert/update statement, and since the client usually needs the new primary key value (assuming it was auto generated), we've found it's best to return the whole row.
The select statement will have some sort of an advantage only if the data is generated in the procedure. Otherwise the data that you have inserted is generally available to you already so no point in selecting and returning again, IMHO. if its for the id then you can have it with SCOPE_IDENTITY(), that will return the last identity value created in the current session for the insert.
Based on my prior experience, my knee-jerk reaction is to just return the freshly generated identity value. Everything else the application is inserting, it already knows--names, dollars, whatever. But a few minutes reflection and reading the prior 6 (hmm, make that 5) replies, leads to a number of “it depends” situations:
At the most basic level, what you inserted is what you’d get – you pass in values, they get written to a row in the table, and you’re done.
Slightly more complex that that is when there are simple default values assigned during an insert statement. “DateCreated” columns that default to the current datetime, or “CreatedBy” that default to the current SQL login, are a prime example. I’d include identity columns here, since not every table will (or should) contain them. These values are generated by the database upon table insertion, so the calling application cannot know what they are. (It is not unknown for web server clocks to not be synchronized with database server clocks. Fun times…) If the application needs to know the values just generated, then yes, you’d need to pass those back.
And then there are are situations where additional processing is done within the database before data is inserted into the table. Such work might be done within stored procedures or triggers. Once again, if the application needs to know the results of such calculations, then the data would need to be returned.
With that said, it seems to me the main issue underlying your decision is: how much control/understanding do you have over the database? You say you are using a tool to automatically generate your CRUD procedures. Ok, that means that you do not have any elaborate processing going on within them, you’re just taking data and loading it on in. Next question: are there triggers (of any kind) present that might modify the data as it is being written to the tables? Extend that to: do you know whether or not such triggers exists? If they’re there and they matter, plan accordingly; if you do not or cannot know, then you might need to “follow up” on the insert to see if changes occurred. Lastly: does the application care? Does it need to be informed of the results of the insert action it just requested, and if so, how much does it need to know? (New identity value, date time it was added, whether or not something changed the Name from “Widget” to “Widget_201001270901”.)
If you have complete understanding and control over the system you are building, I would only put in as much as you need, as extra code that performs no useful function impacts performance and maintainability. On the flip side, if I were writing a tool to be used by others, I’d try to build something that did everything (so as to increase my market share). And if you are building code where you don't really know how and why it will be used (application purpose), or what it will in turn be working with (database design), then I guess you'd have to be paranoid and try to program for everything. (I strongly recommend not doing that. Pare down to do only what needs to be done.)
Quite often the database will have a property that gives you the ID of the last inserted item without having to do an additional select. For example, MS SQL Server has the ##Identity property (see here). You can pass this back to your application as an output parameter of your stored procedure and use it to update your data with the new ID. MySQL has something similar.
INSERT
INTO mytable (col1, col2)
OUTPUT INSERTED.*
VALUES ('value1', 'value2')
With this clause, returning the whole row does not require an extra SELECT and performance-wise is the same as returning only the id.
"Which is better" totally depends on your application needs. If you need the whole row, return the whole row, if you need only the id, return only the id.
You may add an extra setting to your business object which can trigger this option and return the whole row only if the object needs it:
IF #return_whole_row = 1
INSERT
INTO mytable (col1, col2)
OUTPUT INSERTED.*
VALUES ('value1', 'value2')
ELSE
INSERT
INTO mytable (col1, col2)
OUTPUT INSERTED.id
VALUES ('value1', 'value2')
FI
I don't think I would in general return an entire row, but it could be a useful technique.
If you are code-generating, you could generate two procs (one which calls the other, perhaps) or parametrize a single proc to determine whther to return it over the wire or not. I doubt the DB overhead is significant (single-row, got to have a PK lookup), but the data on the wire from DB to client could be significant when all added up and if it's just discarded in 99% of the cases, I see little value. Having an SP which returns different things with different parameters is a potential problem for clients, of course.
I can see where it would be useful if you have logic in triggers or calculated columns which are managed by the database, in which case, a SELECT is really the only way to get that data back without duplicating the logic in your client or the SP itself. Of course, the place to put any logic should be well thought out.
Putting ANY logic in the database is usually a carefully-thought-out tradeoff which starts with the minimally invasive and maximally useful things like constraints, unique constraints, referential integrity, etc and growing to the more invasive and marginally useful tools like triggers.
Typically, I like logic in the database when you have multi-modal access to the database itself, and you can't force people through your client assemblies, say. In this case, I would still try to force people through views or SPs which minimize the chance of errors, duplication, logic sync issues or misinterpretation of data, thereby providing as clean, consistent and coherent a perimeter as possible.
I have been dancing around this issue for awhile but it keeps coming up. We have a system and our may of our tables start with a description that is originally stored as an NVARCHAR(150) and I then we get a ticket asking to expand the field size to 250, then 1000 etc, etc...
This cycle is repeated on ever "note" field and/or "description" field we add to most tables. Of course the concern for me is performance and breaking the 8k limit of the page. However, my other concern is making the system less maintainable by breaking these fields out of EVERY table in the system into a lazy loaded reference.
So here I am faced with these same to 2 options that have been staring me in the face. (others are welcome) please lend me your opinions.
Change all may notes and/or descriptions to NVARCHAR(MAX) and make sure we do exclude these fields in all listings. Basically never do a: SELECT * FROM [TableName] unless is it only retrieving one record.
Remove all notes and/or description fields and replace them with a forign key reference to a [Notes] table.
CREATE TABLE [dbo].[Notes] (
[NoteId] [int] NOT NULL,
[NoteText] [NVARCHAR](MAX)NOT NULL )
Obviously I would prefer use option 1 because it will change so much in our system if we go with 2. However if option 2 is really the only good way to proceed, then at least I can say these changes are necessary and I have done the homework.
UPDATE:
I ran several test on a sample database with 100,000 records in it. What I find is that the because of cluster index scans the IO required for option 1 is "roughly" twice that of option 2. If I select a large number of records (1000 or more) option 1 is twice as slow even if I do not include the large text field in the select. As I request less rows the lines blur more. I a web app where page sizes of 50 or so are the norm, so option 1 will work, but I will be converting all instances to option 2 in the (very) near future for scalability.
Option 2 is better for several reasons:
When querying your tables, the large
text fields fill up pages quickly,
forcing the database to scan more
pages to retrieve data. This is
especially taxing when you don't
actually need to return the text
data.
As you mentioned, it gives you
a clean break to change the data
type in one swoop. Microsoft has
deprecated TEXT in SQL Server 2008,
so you should stick with
VARCHAR/VARBINARY.
Separate filegroups. Having
all your text data in a slower,
cheaper storage location might be
something you decide to pursue in
the future. If not, no harm, no
foul.
While Option 1 is easier for now, Option 2 will give you more flexibility in the long-term. My suggestion would be to implement a simple proof-of-concept with the "notes" information separated from the main table and perform some of your queries on both examples. Compare the execution plans, client statistics and logical I/O reads (SET STATISTICS IO ON) for some of your queries against these tables.
A quick note to those suggesting the use of a TEXT/NTEXT from MSDN:
This feature will be removed in a
future version of Microsoft SQL
Server. Avoid using this feature in
new development work, and plan to
modify applications that currently use
this feature. Use varchar(max),
nvarchar(max) and varbinary(max) data
types instead. For more information,
see Using Large-Value Data Types.
I'd go with Option 2.
You can create a view that joins the two tables to make the transition easier on everyone, and then go through a clean-up process that removes the view and uses the single table wherever possible.
You want to use a TEXT field. TEXT fields aren't stored directly in the row; instead, it stores a pointer to the text data. This is transparent to queries, though - if you ask for a TEXT field, it will return the actual text, not the pointer.
Essentially, using a TEXT field is somewhat between your two solutions. It keeps your table rows much smaller than using a varchar, but you'll still want to avoid asking for them in your queries if possible.
The TEXT/NTEXT data type has practically unlimited length while taking up next to nothing in your record.
It comes with a few strings attached, like special behavior with string functions, but for a secondary "notes/description" type of field these may be less of a problem.
Just to expand on Option 2
You could:
Rename existing MyTable to MyTable_V2
Move the Notes column into a joined Notes table (with 1:1 joining ID)
Create a VIEW called MyTable that joins MyTable_V2 and Notes tables
Create an INSTEAD OF trigger on MyTable view which saves the Notes column into the Notes table (IF NULL then delete any existing Notes row, if NOT NULL then Insert if not found, otherwise Update). Perform appropriate action on MyTable_V2 table
Note: We've had trouble doing this where there is a Computed column in MyTable_V2 (I think that was the problem, either way we've hit snags when doing this with "unusual" tables)
All new Insert/Update/Delete code should be written to operate directly on MyTable_V2 and Notes tables
Optionally: Have the INSERT OF trigger on MyTable log the fact that it was called (it can do this minimally, UPDATE a pre-existing log table row with GetDate() only if existing row's date is > 24 hours old - so will only do an update once a day).
When you are no longer getting any log records you can drop the INSTEAD OF trigger on MyTable view and you are now fully MyTable_V2 compliant!
Huge amount of hassle to implement, as you surmised.
Alternatively trawl the code for all references to MyTable and change them to MyTable_V2, put a VIEW in place of MyTable for SELECT only, and not create the INSTEAD OF trigger.
My plan would be to fix all Insert/Update/Delete statements referencing the now deprecated MyTable. For me this would be made somewhat easier because we use unique names for all tables and columns in the database, and we use the same names in all application code, so making sure I had found all instances by a simple FIND would be high.
P.S. Option 2 is also preferable if you have any SELECT * lying around. We have had clients whos application performance has gone downhill fast when they added large Text/Blob columns to existing tables - because of "lazy" SELECT * statements. Hopefully that isn;t the case in your shop though!