Let's say I have the following data in my database:
k c1 c2
0 1 null
1 null 1
Say I run a calculation which gives me a list of pairs (k, c1). I want to push these pairs into my database as follows:
If k exists in my table, update c1 (leave c2 as is)
If k does not exist in my table, add a new row with c2 set to null (or whatever the default value is for that column)
Is there a way to do this in a single operation, akin to
table.insert_many(rows)
?
table.inser_many doesn't work for me, as it forces c2 to null even if c2 used to have a value.
Depends on your database, but most of them (including recent versions of sqlite) support some flavor of upsert / insert on conflict update.
http://docs.peewee-orm.com/en/latest/peewee/querying.html#upsert
Consult your database docs for details on the specific implementation.
Related
Let's consider simple Excel table associated with SQL Server's table:
ID some_data
0 a
1 b
2 c
I'd like to extend it with manually added column (not present in SQL Server's table):
ID some_data my_column
0 a some_data_for_0
1 b some_data_for_1
2 c some_data_for_2
However, when source data are changed (rows inserted / deleted / updated) the relation between my_column and ID column is not preserved. For example, when new row (3, d) is added:
ID some_data my_column
0 a some_data_for_0
1 b some_data_for_1
2 c
3 d some_data_for_2
Is there any Excel built-in solution that would allow me to specify how my_column rows should be ordered in relation to ID column or do I need to implement it by myself using VBA?
You could use an ORDER BY clause in your SQL statement, but even that's not very reliable. The only reliable way to do this is store your additional data in its own table and use a formula to relate it to the SQL data.
On a separate worksheet, put
ID my_column
0 some_data_for_0
1 some_data_for_1
2 some_data_for_2
Now in a column adjacent to the SQL data, put
=IFERROR(VLOOKUP([#ID],tblAddtlInfo,2,FALSE),"")
However the SQL data is sorted, the additional info will be in the right row. This assumes you made your additional info list into a table and named it tblAddtlInfo.
If you want to get fancy, you can write some code in the Change event that looks for non-formulas in the extra column. If the formula gets over written, then grab the new data, add it to (or update) your additional info table, and restore the formula. Then you can type the data in the row, but maintain the integrity by moving it to a different table.
I have two tables with similar data. Wanting to find closest matches for comparison. Here's what I was trying to do:
select a.field1 as a1, b.field1 as b1, a.field2 as a2, b.field2 as b2
from foo a
left join (
select top 1 tmp.field1, tmp.field2
from foo2 tmp
-- The closest match will match the most fields. Add up these.
order by case when tmp.field1 = a.field1 then 1 else 0 end
+ case when tmp.field2 = a.field2 then 1 else 0 end
desc) b on 1 = 1
I can't reference the main selection table in the join though.
Perhaps I'm going about it all wrong. The actual goal is that I was given a spreadsheet of data and told to update a database. The spreadsheet has no PK and is missing many fields that the database has. Also, the database has foreign keys and child data all over. So I don't want to delete/insert. Instead I want to compare values and update wherever possible. So I created two temporary tables and pulled the database records into one and the spreadsheet records into another. Now I'm wanting to work with those two tables to update records, and finally delete/insert where no update is available.
Have you looked up the MERGE statement? It does what you want, although the syntax is a bit tricky.
Article here with half decent examples:
http://technet.microsoft.com/en-us/library/bb522522(v=sql.105).aspx
Using SQL Server 2008, have three tables, table a, table b and table c.
All have an ID column, but for table a and b the ID column is an identity integer, for table c the ID column is a varchar type
Currently a stored procedure take a name param, following certain logic, insert to table a or table b, get the identity, prefix with 'A' or 'B' then insert to table c.
Problem is, table C ID column potentially have the duplicated values, i.e. if identity from table A is 2, there might already have 'A2','A3','A5' in the ID column for table C, how to write a T-SQL query to identify the next available value in table C then ensure to update table A/B accordingly?
[Update]
this is the current step,
1. depends on input parameter, insert to table A or table B
2. initialize seed value = ##Identity
3. calculate ID value to insert to table C by prefix 'A' or append 'B' with the seed value
4. look for record match in table C by ID value from step 3, if didn't find any record, insert it, else increase seed value by 1 then repeat step 3
The issue being at a certain value range, there could be a huge block of value exists in table C ID, i.e. A3000 to A500000 existed now in table C ID, the database query is extemely slow if follow the existing logic. Needs to figure out a logic to smartly get the minimum available number (without the prefix)
it is hard to describe, hope this make more sense, I truly appreciate any help on this Thanks in advance!
This should do the trick. Simple self extracting example will work in SSMS. I even made it out of order just in case. You would just change your table to be where #Data is and then change Identifier field to replace 'ID'.
declare #Data Table ( Id varchar(3) );
insert into #Data values ('A5'),('A2'),('B1'),('A3'),('B2'),('A4'),('A1'),('A6');
With a as
(
Select
ID
, cast(right(Id, len(Id)-1) as int) as Pos
, left(Id, 1) as TableFrom
from #Data
)
select
TableFrom
, max(Pos) + 1 as NextNumberUp
from a
group by TableFrom
EDIT: If you want to not worry about production data you could add this last part amending what I wrote:
Select
TableFrom
, max(Pos) as LastPos
into #Temp
from a
group by TableFrom
select TableFrom, LastPos + 1
from #Temp
Regardless if this was production environment you are going to have to hit part of it at some time to get data. If the datasets are not too large and just varchar(256) or less and only 5 million rows or less you could dump that entire column from tableC to a temp table. Honestly query performance versus imports change vastly from system to system.
Following your design there shouldn't be any duplicates in Table C considering that A and B are unique.
A | B | C
1 1 A1
2 2 A2
B1
B2
I have a table named Books which contains some columns.
ColumnNames: BookId, BookName, BookDesc, xxx
I want to track changes for certain columns. I don't have to maintain history of old value and new value. I just want to track that value is changed or not.
What is the best way to achieve this?
1) Create Books table as:
ColumnNames: BookId, BookName, BookName_Changed_Flag, BookDesc, BookDesc_Changed_Flag,
xxx, xxx_Changed_Flag?
2) Create a separate table Books_Change_Log exactly like Books table but only with track change columns as:
ColumnNames: BookId, BookName_Changed_Flag, BookDesc_Changed_Flag, xxx_Changed_Flag?
Please advise.
--Update--
There are more than 20 columns in each table. And each column represents a certain element in UI. If a column value is ever changed from its original record, i need to display the UI element that represents the column value in different style. Rest of the elements should appear normal.
How to use a bitfield in TSQL (for updates and reads)
Set the bitfield to default to 0 at start (meaning no changes) you should use type int for up to 32 bits of data and bigint for up to 64 bits of data.
To set a bit in a bit field use the | (bit OR operator) in the update statement, for example
UPDATE table
SET field1 = 'new value', bitfield = bitfield | 1
UPDATE table
SET field2 = 'new value', bitfield = bitfield | 2
etc for each field use the 2 to power of N-1 for the value after the |
To read a bit field use & (bit AND operator) and see if it is true, for example
SELECT field1, field2,
CASE WHEN (bitfield & 1) = 1 THEN 'field1 mod' ELSE 'field1 same' END,
CASE WHEN (bitfield & 2) = 2 THEN 'field2 mod' ELSE 'field2 same' END
FROM table
note I would probably not use text since this will be used by an application, something like this will work
SELECT field1, field2,
CASE WHEN (bitfield & 1) = 1 THEN 1 ELSE 0 END AS [field1flag],
CASE WHEN (bitfield & 2) = 2 THEN 1 ELSE 0 END AS [field2flag]
FROM table
or you can use != 0 above to make it simple as I did in my test below
Have to actually test to not have errors, click for the test script
original answer:
If you have less than 16 columns in your table you could store the "flags" as an integer then use the bit flag method to indicate the columns that changed. Just ignore or don't bother marking the ones that you don't care about.
Thus if flagfield BOOLEAN AND 2^N is true it indicates that the Nth field changed.
Or an example for max of N = 2
0 - nothing has changed (all bits 0)
1 - field 1 changed (first bit 1)
2 - field 2 changed (second bit 1)
3 - field 1+2 changed (first and second bit 1)
see this link for a better definition: http://en.wikipedia.org/wiki/Bit_field
I know you said you don't need it, but sometimes it's just easier to use something off the shelf which does everything, like: http://autoaudit.codeplex.com/
This just adds a few columns to your table and is not nearly as invasive as either of your proposed schemas, and the trigger necessary to track the changes are also generated by the tool.
You should have a log table that stores the BookId and the date of the change (you don't need those other columns - as you stated, you don't need the old and new values, and you can always get the current value for name, description etc. from the Books table, no reason to store it twice). Unless you are only interested in the last time it changed. You can populate the log table with a simple for update trigger on the books table. For example with the new information you've provided:
CREATE TABLE dbo.BookLog
(
BookID INT PRIMARY KEY,
NameHasChanged BIT NOT NULL DEFAULT 0,
DescriptionHasChanged BIT NOT NULL DEFAULT 0
--, ... 18 more columns
);
CREATE TRIGGER dbo.CreateBook
ON dbo.Books FOR INSERT
AS
BEGIN
SET NOCOUNT ON;
INSERT dbo.BookLog(BookID) SELECT BookID FROM inserted;
END
GO
CREATE TRIGGER dbo.ModifyBook
ON dbo.Books FOR UPDATE
AS
BEGIN
SET NOCOUNT ON;
UPDATE t SET
t.NameChanged = CASE WHEN i.name <> d.name
THEN 1 ELSE t.NameChanged END,
t.DescriptionChanged = CASE WHEN i.description <> d.description
THEN 1 ELSE t.DescriptionChanged END,
--, 18 more of these assuming all can be compared with simple <> ...
FROM dbo.BookLog AS t
INNER JOIN inserted AS i ON i.BookID = t.BookID
INNER JOIN deleted AS d ON d.BookID = i.BookID;
END
GO
I can guarantee you that after you deliver this solution, one of the next requests is going to be "show me what it was before". Just go ahead and have a history table. That will solve your current problem AND your future problem. It is a pretty standard design on non-trivial systems.
Put two datetime columns in your table, "created_at" and "updated_at". Default both to current_timestamp. Only ever set the value of updated_at if you are changing the data in the row. You can enforce this with a trigger on the table that checks to see if any of the column values are changing, and then updates "updated_at" if so.
When you want to check if a row has ever changed, just check if updated_at > created_at.
I know that the value itself for a RowVersion column is not in and of itself useful, except that it changes each time the row is updated. However, I was wondering if they are useful for relative (inequality) comparison.
If I have a table with a RowVersion column, are either of the following true:
Will all updates that occur simultaneously (either same update statement or same transaction) have the same value in the RowVersion column?
If I do update "A", followed by update "B", will the rows involved in update "B" have a higher value than the rows involved in update "A"?
Thanks.
From MSDN:
Each database has a counter that is incremented for each insert or update operation that is performed on a table that contains a rowversion column within the database. This counter is the database rowversion. This tracks a relative time within a database, not an actual time that can be associated with a clock. Every time that a row with a rowversion column is modified or inserted, the incremented database rowversion value is inserted in the rowversion column.
http://msdn.microsoft.com/en-us/library/ms182776.aspx
As far as I understand, nothing ACTUALLY happens simultaneously in the system. This means that all rowversions should be unique. I venture to say that they would be effectively useless if duplicates were allowed within the same table. Also giving credance to rowversions not being duplicated is MSDN's stance on not using them as primary keys not because it would cause violations, but because it would cause foreign key issues.
According to MSDN, "The rowversion data type is just an incrementing number..." so yes, later is larger.
To the question of how much it increments, MSDN states, "[rowversion] tracks a relative time within a database" which indicates that it is not a fluid integer incrementing, but time based. However, this "time" reveals nothing of when exactly, but rather when in relation to other rows a row was inserted/modified.
Some additional information.
RowVersion converts nicely to bigint and thus one can display better readable output when debugging:
CREATE TABLE [dbo].[T1](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Value] [nvarchar](50) NULL,
[RowVer] [timestamp] NOT NULL
)
insert into t1 ([value]) values ('a')
insert into t1 ([value]) values ('b')
insert into t1 ([value]) values ('c')
select Id, Value,CONVERT(bigint,rowver)as RowVer from t1
update t1 set [value] = 'x' where id = 3
select Id, Value,CONVERT(bigint,rowver)as RowVer from t1
update t1 set [value] = 'y'
select Id, Value,CONVERT(bigint,rowver)as RowVer from t1
Id Value RowVer
1 a 2037
2 b 2038
3 c 2039
Id Value RowVer
1 a 2037
2 b 2038
3 x 2040
Id Value RowVer
1 y 2041
2 y 2042
3 y 2043
I spent ages trying to sort something out with this - to ask for columns updated after a particular sequence number. The timestamp is really just a sequence number - it's also bigendian when c# functions like BitConverter.ToInt64 want littleendian.
I ended up creating a db view on the table i want data from with an alias column 'SequenceNo'
SELECT ID, CONVERT(bigint, Timestamp) AS SequenceNo
FROM dbo.[User]
c# Code first sees the view (ie UserV) identically to a normal table
then in my linq I can join the view and parent table and compare with a sequence number
var users = (from u in context.GetTable<User>()
join uv in context.GetTable<UserV>() on u.ID equals uv.ID
where mysequenceNo < uv.SequenceNo
orderby uv.SequenceNo
select u).ToList();
to get what I want - all the entries changed since the last time I checked.
What makes you think Timestamp data types are evil? The data type is very useful for concurrency checking. Linq-To-SQL uses this data type for this very purpose.
The answers to your questions:
1) No. This value is updated each time the row is updated. If you are updating the row say five times, each update will increment the Timestamp value. Of course, you realize that updates that "occur simultaneously" really don't. They still only occur one at a time, in turn.
2) Yes.
Just as a note, timestamp is deprecated in SQL Server 2008 onwards. rowversion should be used instead.
From this page on MSDN:
The timestamp syntax is deprecated. This feature will be removed in a
future version of Microsoft SQL Server. Avoid using this feature in
new development work, and plan to modify applications that currently
use this feature.
Rowversion does break one of the "idealistic" approaches of SQL - that an UPDATE statement is a single, atomic action, and acts as if all UPDATEs (both to all columns within a row, and all rows within the table) occur "at the same time". But in this case, with Rowversion, it is possible to determine that one row was updated at a slightly different time than another.
Note that the order in which rows are updated (by a single update statement) is not guaranteed - it may, by coincidence follow the same order as the clustered key for the table, but I wouldn't count on that being true.
To answer part of your question: you can end up with duplicate values according to MSDN:
Duplicate rowversion values can be generated by using the SELECT INTO
statement in which a rowversion column is in the SELECT list. We do
not recommend using rowversion in this manner.
Source: rowversion (Transact-SQL)
Every database has a counter that is incremented one by one on every data modification that is done in the database. If the table containing the affected (by update/insert) row contains a timestamp/rowversion column, the current counter value of the database is stored in that column of the updated/inserted record.