Most efficient way to auto-hookup a foreign key in LINQ

Most efficient way to auto-hookup a foreign key in LINQ - sql-server

I have two tables for tracking user sessions on my site. This is a gross oversimplification btw :
Campaign:
campaignId [int]
campaignKey [varchar(20)]
description [varchar(50)]
Session:
sessionDate [datetime]
sessionGUID [uniqueidentifier]
campaignId [int]
campaignKey [varchar(20)]
I want to insert a new record into Session, using LINQ :
var s = new Session();
dbContext.Session.InsertOnSubmit(s);
s.sessionDate = DateTime.Now;
s.sessionGUID = Guid.NewGuid();
s.campaignKey = Request.Params["c"];
// dont set s.campaignId here...
dbContext.SubmitChanges();
Notice that I am not currently setting campaignId in this code.
What I want is for something to automaticaly hookup the foreign key to the 'Campaign' table, and whatever does it must first add a new row to the 'Campaign' table if the campaign passed in has never been used before.
I have a few decisions to make and would really appreciate insight on any of them :
I don't know if I should use a trigger, a stored proc or do it in LINQ manually :
Trigger: slightly icky, never really liked using them, but would guarantee the 'campaignId' was updated by the time I need it
Stored proc: again slightly icky, my SQL is not great and I value the consistency of being able to do everything in LINQ as much as possible.
Linq manually: i'd have to keep a copy of the 'Campaign' table in memory and use a C# hashtable to do the lookup. i'd then have to worry about keeping that table up to date if another client added a new campaign.
My main two reasons for wanting this foreign key table is for a more efficient index on 'Session' for 'campaignId' so that I can do grouping faster. it just seems like it ought to be a lot faster if its just an integer column being grouped. The second reason is to give partners permissions to see only their campaigns through joins with other tables.
Before anyone asks I do NOT know the campaigns in advance, as some will be created by partners.
Most importantly: I am primarily looking for the most 'LINQ friendly' solution.

I would definitely recommend adding a nullable foreign key constraint on the Session table. Once you have that setup, it should be as simple as tacking on a method to the Session class:
public partial class Session
{
public void SetCampaignKey(string key)
{
// Use an existing campaign if one exists
Campaign campaign =
from c in dbContext.Campaigns
where c.campaignKey == key
select c;
// Create a new campaign if necessary
if (campaign == null)
{
campaign = new Campaign();
campaign.campaignKey = key;
campaign.description = string.Empty; // Not sure where this comes in
dbContext.Campaign.InsertOnSubmit(campaign);
}
// We can now set the reference directly
this.Campaign = campaign;
}
}
My LINQ may be a bit off, but something like this should work.
You can call SetCampaignKey() instead of manually setting the campaignKey property. When you call dbContext.SubmitChanges, the campaign will be added if necessary and the Session entry will be updated accordingly.
In this case, only the campaignId property would be set automatically. You could rely on a simple trigger to set campaignKey or do away with it. You could always retrieve the value by joining on the Campaign table.
Am I oversimplifying the problem?

Related

Will simplify inserts into tables with auto-generated primary keys?

In Slick 1.x, inserting into a table with an auto generated primary key was kind of complicated: you had to manually create a table projection that omitted the pk for insert purposes. It looks like Slick 2.x will fix this problem:
Soft inserts are now the default, i.e. AutoInc columns are automatically skipped when inserting with +=, ++=, insert and insertAll. This means that you no longer need separate projections (without the primary key) for inserts.
However the 2.x docs must not be updated:
While some database systems allow inserting proper values into AutoInc columns or inserting None to get a created value, most databases forbid this behaviour, so you have to make sure to omit these columns. Slick does not yet have a feature to do this automatically but it is planned for a future release. For now, you have to use a query with a custom projection which does not include the AutoInc column
Does anyone know the new 2.0 syntax for doing an insert into a table with AutoInc and get the generated key back?

The syntax for inserts is the same as in 1.0, only that now autoinc columns are automatically ignored. So there is a semantic change in what .insert does. If you want the old behavior (where they are included) you have to call .forceInsert.

You can retrieve the generated value like this:
case class Employee( empName: String,empType: String, empId: Int = 0)
class Employees(tag: Tag) extends Table[Employee](tag, "emp") {
def empId = column[Int]("id", O.PrimaryKey, O.AutoInc)
def empName = column[String]("name", O DBType ("VARCHAR(100)"))
def empType = column[String]("type")
def * = (empName, empType, empId) <> (Employee.tupled, Employee.unapply)
}
val employees = TableQuery[Employees]
val myInsert = employees.map(e => (e.empName, e.empType)) returning employees.map(_.empId)
val autoGenratedKey = myInsert.insert("satendra", "permanent")

Stored procedure to update a table based on data from table parameter

I want to begin by stating I'm an SQL noob, so I'd appreciate any suggestions or comments on my workflow and/or mindset when trying to solve this issue.
What I'm doing is gathering usage statistics about several applications, in several categories (not all categories necessarily apply to all applications), storing them in a database.
I've set up a few tables to do that, and then one table to link everything together that's structured like so (from now on: Dtable):
(column name - details)
UserID - foreign key to another table which stores users data
ApplicationID - foreign key to another table which stores applications data
CategoryID - foreign key to another table which holds a list of different categories
Value - the actual data
Each application gathers the data, then submits it to the database using a stored procedure. As the amount of data can be different based on actual usage (not always sending every category) and for each application, I was thinking of sending the data as a DataTable with a list of CategoryID and Value so I won't have to call a procedure for every individual category (Ptable).
I need to update each record in Dtable to the correct value in Ptable according to CategoryID, but also filtered by UserID and ApplicationID. UserID and ApplicationID will be given as two other parameters to the Stored Procedure. Ptable only contains a list of CategoryID / Value records.
Now, I read about Cursors (for each record in the table parameter set the relevant data in the database table), but the consensus seems to be "Avoid at all costs".
How would I go about updating the table, then, based on the varying records in Ptable?
P.S.
The tables are structured like so to keep agility and scalability in adding more categories/applications in the future. If there's a better way to do it I'll be happy to know.

I believe the update statement would look something like this, where #ApplicationID and #UserID are the stored proc's other parameters:
update Dtable
set Dtable.Value = p.Value
from Ptable p
where Dtable.UserID = #UserID
and Dtable.ApplicationID = #ApplicationID
and Dtable.CategoryID = p.CategoryID;

How to retrieve the last autoincremented ID from a SQLite table?

I have a table Messages with columns ID (primary key, autoincrement) and Content (text).
I have a table Users with columns username (primary key, text) and Hash.
A message is sent by one Sender (user) to many recipients (user) and a recipient (user) can have many messages.
I created a table Messages_Recipients with two columns: MessageID (referring to the ID column of the Messages table and Recipient (referring to the username column in the Users table). This table represents the many to many relation between recipients and messages.
So, the question I have is this. The ID of a new message will be created after it has been stored in the database. But how can I hold a reference to the MessageRow I just added in order to retrieve this new MessageID? I can always search the database for the last row added of course, but that could possibly return a different row in a multithreaded environment?
EDIT: As I understand it for SQLite you can use the SELECT last_insert_rowid(). But how do I call this statement from ADO.Net?
My Persistence code (messages and messagesRecipients are DataTables):
public void Persist(Message message)
{
pm_databaseDataSet.MessagesRow messagerow;
messagerow=messages.AddMessagesRow(message.Sender,
message.TimeSent.ToFileTime(),
message.Content,
message.TimeCreated.ToFileTime());
UpdateMessages();
var x = messagerow;//I hoped the messagerow would hold a
//reference to the new row in the Messages table, but it does not.
foreach (var recipient in message.Recipients)
{
var row = messagesRecipients.NewMessages_RecipientsRow();
row.Recipient = recipient;
//row.MessageID= How do I find this??
messagesRecipients.AddMessages_RecipientsRow(row);
UpdateMessagesRecipients();//method not shown
}
}
private void UpdateMessages()
{
messagesAdapter.Update(messages);
messagesAdapter.Fill(messages);
}

One other option is to look at the system table sqlite_sequence. Your sqlite database will have that table automatically if you created any table with autoincrement primary key. This table is for sqlite to keep track of the autoincrement field so that it won't repeat the primary key even after you delete some rows or after some insert failed (read more about this here http://www.sqlite.org/autoinc.html).
So with this table there is the added benefit that you can find out your newly inserted item's primary key even after you inserted something else (in other tables, of course!). After making sure that your insert is successful (otherwise you will get a false number), you simply need to do:
select seq from sqlite_sequence where name="table_name"

With SQL Server you'd SELECT SCOPE_IDENTITY() to get the last identity value for the current process.
With SQlite, it looks like for an autoincrement you would do
SELECT last_insert_rowid()
immediately after your insert.
http://www.mail-archive.com/sqlite-users#sqlite.org/msg09429.html
In answer to your comment to get this value you would want to use SQL or OleDb code like:
using (SqlConnection conn = new SqlConnection(connString))
{
string sql = "SELECT last_insert_rowid()";
SqlCommand cmd = new SqlCommand(sql, conn);
conn.Open();
int lastID = (Int32) cmd.ExecuteScalar();
}

I've had issues with using SELECT last_insert_rowid() in a multithreaded environment. If another thread inserts into another table that has an autoinc, last_insert_rowid will return the autoinc value from the new table.
Here's where they state that in the doco:
If a separate thread performs a new INSERT on the same database connection while the sqlite3_last_insert_rowid() function is running and thus changes the last insert rowid, then the value returned by sqlite3_last_insert_rowid() is unpredictable and might not equal either the old or the new last insert rowid.
That's from sqlite.org doco

According to Android Sqlite get last insert row id there is another query:
SELECT rowid from your_table_name order by ROWID DESC limit 1

Sample code from #polyglot solution
SQLiteCommand sql_cmd;
sql_cmd.CommandText = "select seq from sqlite_sequence where name='myTable'; ";
int newId = Convert.ToInt32( sql_cmd.ExecuteScalar( ) );

sqlite3_last_insert_rowid() is unsafe in a multithreaded environment (and documented as such on SQLite)
However the good news is that you can play with the chance, see below
ID reservation is NOT implemented in SQLite, you can also avoid PK using your own UNIQUE Primary Key if you know something always variant in your data.
Note:
See if the clause on RETURNING won't solve your issue
https://www.sqlite.org/lang_returning.html
As this is only available in recent version of SQLite and may have some overhead, consider Using the fact that it's really bad luck if you have an insertion in-between your requests to SQLite
see also if you absolutely need to fetch SQlite internal PK, can you design your own predict-able PK:
https://sqlite.org/withoutrowid.html
If need traditional PK AUTOINCREMENT, yes there is a small risk that the id you fetch may belong to another insertion. Small but unacceptable risk.
A workaround is to call twice the sqlite3_last_insert_rowid()
#1 BEFORE my Insert, then #2 AFTER my insert
as in :
int IdLast = sqlite3_last_insert_rowid(m_db); // Before (this id is already used)
const int rc = sqlite3_exec(m_db, sql,NULL, NULL, &m_zErrMsg);
int IdEnd = sqlite3_last_insert_rowid(m_db); // After Insertion most probably the right one,
In the vast majority of cases IdEnd==IdLast+1. This the "happy path" and you can rely on IdEnd as being the ID you look for.
Else you have to need to do an extra SELECT where you can use criteria based on IdLast to IdEnd (any additional criteria in WHERE clause are good to add if any)
Use ROWID (which is an SQlite keyword) to SELECT the id range that is relevant.
"SELECT my_pk_id FROM Symbols WHERE ROWID>%d && ROWID<=%d;",IdLast,IdEnd);
// notice the > in: ROWID>%zd, as we already know that IdLast is NOT the one we look for.
As second call to sqlite3_last_insert_rowid is done right away after INSERT, this SELECT generally only return 2 or 3 row max.
Then search in result from SELECT for the data you Inserted to find the proper id.
Performance improvement: As the call to sqlite3_last_insert_rowid() is way faster than the INSERT, (Even if mutex may make that wrong it is statistically true) I bet on IdEnd to be the right one and unwind the SELECT results by the end. Nearly in every cases we tested the last ROW does contain the ID you look for).
Performance improvement: If you have an additional UNIQUE Key, then add it to the WHERE to get only one row.
I experimented using 3 threads doing heavy Insertions, it worked as expected, the preparation + DB handling take the vast majority of CPU cycles, then results is that the Odd of mixup ID is in the range of 1/1000 insertions (situation where IdEnd>IdLast+1)
So the penalty of an additional SELECT to resolve this is rather low.
Otherwise said the benefit to use the sqlite3_last_insert_rowid() is great in the vast majority of Insertion, and if using some care, can even safely be used in MT.
Caveat: Situation is slightly more awkward in transactional mode.
Also SQLite didn't explicitly guaranty that ID will be contiguous and growing (unless AUTOINCREMENT). (At least I didn't found information about that, but looking at the SQLite source code it preclude that)

the simplest method would be using :
SELECT MAX(id) FROM yourTableName LIMIT 1;
if you are trying to grab this last id in a relation to effect another table as for example : ( if invoice is added THEN add the ItemsList to the invoice ID )
in this case use something like :
var cmd_result = cmd.ExecuteNonQuery(); // return the number of effected rows
then use cmd_result to determine if the previous Query have been excuted successfully, something like : if(cmd_result > 0) followed by your Query SELECT MAX(id) FROM yourTableName LIMIT 1; just to make sure that you are not targeting the wrong row id in case the previous command did not add any Rows.
in fact cmd_result > 0 condition is very necessary thing in case anything fail . specially if you are developing a serious Application, you don't want your users waking up finding random items added to their invoice.

I recently came up with a solution to this problem that sacrifices some performance overhead to ensure you get the correct last inserted ID.
Let's say you have a table people. Add a column called random_bigint:
create table people (
id int primary key,
name text,
random_bigint int not null
);
Add a unique index on random_bigint:
create unique index people_random_bigint_idx
ON people(random_bigint);
In your application, generate a random bigint whenever you insert a record. I guess there is a trivial possibility that a collision will occur, so you should handle that error.
My app is in Go and the code that generates a random bigint looks like this:
func RandomPositiveBigInt() (int64, error) {
nBig, err := rand.Int(rand.Reader, big.NewInt(9223372036854775807))
if err != nil {
return 0, err
}
return nBig.Int64(), nil
}
After you've inserted the record, query the table with a where filter on the random bigint value:
select id from people where random_bigint = <put random bigint here>
The unique index will add a small amount of overhead on the insertion. The id lookup, while very fast because of the index, will also add a little overhead.
However, this method will guarantee a correct last inserted ID.

How to make tasks double-checked (the way how to store it in the DB)?

I have a DB that stores different types of tasks and more items in different tables.
In many of these tables (that their structure is different) I need a way to do it that the item has to be double-checked, meaning that the item can't be 'saved' (I mean of course it will be saved) before someone else goes in the program and confirms it.
What should be the right way to say which item is confirmed:
Each of these tables should have a column "IsConfirmed", then when that guy wants to confirm all the stuff, the program walks thru all the tables and creates a list of the items that are not checked.
There should be a third table that holds the table name and Id of that row that has to be confirmed.
I hope you have a better idea than the two uglies above.

Is the double-confirmed status something that happens exactly once for an entity? Or can it be rejected and need to go through confirmation again? In the latter case, do you need to keep all of this history? Do you need to keep track of who confirmed each time (e.g. so you don't have the same person performing both confirmations)?
The simple case:
ALTER TABLE dbo.Table ADD ConfirmCount TINYINT NOT NULL DEFAULT 0;
ALTER TABLE dbo.Table ADD Processed BIT NOT NULL DEFAULT 0;
When the first confirmation:
UPDATE dbo.Table SET ConfirmCount = 1 WHERE PK = <PK> AND ConfirmCount = 0;
On second confirmation:
UPDATE dbo.Table SET ConfirmCount = 2 WHERE PK = <PK> AND ConfirmCount = 1;
When rejected:
UPDATE dbo.Table SET ConfirmCount = 0 WHERE PK = <PK>;
Now obviously your background job can only treat rows where Processed = 0 and ConfirmCount = 2. Then when it has processed that row:
UPDATE dbo.Table SET Processed = 1 WHERE PK = <PK>;
If you have a more complex scenario than this, please provide more details, including the goals of the double-confirm process.

Consider adding a new table to hold the records to be confirmed (e.g. TasksToBeConfirmed). Once the records are confirmed, move those records to the permanent table (Tasks).
The disadvantage of adding an "IsConfirmed" column is that virtually every SQL statement that uses the table will have to filter on "IsConfirmed" to prevent getting unconfirmed records. Every time this is missed, a defect is introduced.
In cases where you need confirmed and unconfirmed records, use UNION.
This pattern is a little more work to code and implement, but in my experience, significantly improves performance and reduces defects.

MS SQL Server trigger to update item rating and number of votes

To make this easier to understand, I will present the exact same problem as if it was about a forum (the actual app doesn't have to do with forums at all, but I think such a parallel is easier for most of us to grasp, the actual app is about something very specific that most programmers won't understand (it's an app intended for hardcore graphic designers)).
Let's suppose that there is a thread table that stores information about each forum thread and a threadrating table that stores thread ratings per user (1-5). For efficiency I decided to cache the rating average and number of votes in the thread table and triggers sounded like a good idea for updating it (I used to do such stuff in the actual application code, but I think triggers are worth a try, despite the debugging dangers).
As you know, MS SQL Server doesn't support a trigger to be executed per row, it has to be per statement. So I tried defining it this way:
CREATE TRIGGER thread_rating ON threadrating
AFTER INSERT
AS
UPDATE thread
SET
thread.rating = (thread.rating * thread.voters + SUM(inserted.rating))/(thread.voters + COUNT(inserted.rating)),
thread.voters = thread.voters + COUNT(inserted.rating)
FROM thread
INNER JOIN inserted ON(inserted.threadid = thread.threadid)
GROUP BY inserted.threadid
but I get an error for the "GROUP BY" clause (which I expected). The question is, how can I make this work?
Sorry if the question is stupid, but it's the first time I actually try to use triggers.
Additional info: The thread table would contain threadid (int, primary key), rating (float), voters(int) and some other fields that are irrelevent to the current question.
The threadrating table only contains threadid (foreign key), userid (foreign key to the primary key of the users table) and rating (tinyint between 1 and 5).
The error message is "Incorrect syntax near the keyword 'GROUP'."

First, I strongly recommend that you not use triggers.
If you're getting a syntax error, check that your parens are balanced as well as your begin/ends. In your case, you have an end (at the end) but no begin. You can fix that be just removing the end.
Once you fix that, you'll likely get some more errors like "columns x,y,z not in an aggregate or group by". That's because you have several columns that are not in either. You need to add thread.rating, thread.voters, etc. to your group by or perform some kind of aggregate on them.
This is all assuming that there are multiple records with the same threadID (ie, it's not the primary key). If that's not the case, then what's the purpose of the group by?
Edit:
I'm stumped on the syntax error. I worked around it with a couple correlated sub queries. I guessed at your table structure so modify as needed and try this:
--CREATE TABLE ThreadRating (threadid int not null, userid int not null, rating int not null)
--CREATE TABLE Thread (threadid int not null, rating int not null, voters int not null)
ALTER TRIGGER thread_rating ON threadrating
AFTER INSERT
AS
UPDATE Thread
SET Thread.rating =
(SELECT (Thread.Rating * Thread.Voters + SUM(I.Rating)) / (Thread.Voters + COUNT(I.Rating))
FROM ThreadRating I WHERE I.ThreadID = thread.ThreadID)
,Thread.Voters =
(SELECT Thread.Voters + COUNT(I.Rating)
FROM ThreadRating I WHERE I.ThreadID = Thread.ThreadID)
FROM Thread
JOIN Inserted ON Inserted.ThreadID = Thread.ThreadID
If that's what you wanted, then we can check the performance/execution plan and modify as needed. We might be able to get it to work with the group by yet.
Alternatives to triggers
If you are updating data that impact ratings in only a few select places, I'd recommend updating the ratings directly there. Factoring the logic into a trigger is nice but provides lots of problems (performance, visibility, etc.). This can be aided by a function.
Consider this: your trigger will execute every single time someone touches that table. Things like view counts, last updated dates, etc. will execute this trigger. You can add logic to short circuit the trigger in those cases but it gets complicated rapidly.

D'ohh! I totally misread your question and I thought you were asking about MySQL. Mea culpa! I will leave the solution below intact, and mark it as community wiki. Maybe it'll be useful to someone with a similar problem on MySQL.
MySQL triggers are executed per row. Also the pseudo-table "inserted" is a Microsoft SQL Server convention.
MySQL uses pseudo-tables NEW and OLD as extensions to the trigger language.
Here's a solution to your problem:
CREATE TRIGGER thread_rating
AFTER INSERT ON threadrating
FOR EACH ROW
BEGIN
UPDATE thread
SET rating = (rating*voters + NEW.rating)/(voters+1),
voters = voters + 1
WHERE threadid = NEW.threadid;
END
Likewise you'd need triggers for UPDATE and DELETE:
CREATE TRIGGER thread_rating
AFTER UPDATE ON threadrating
FOR EACH ROW
BEGIN
UPDATE thread
SET rating = (rating*voters - OLD.rating + NEW.rating)/voters,
WHERE threadid = NEW.threadid;
END
CREATE TRIGGER thread_rating
AFTER DELETE ON threadrating
FOR EACH ROW
BEGIN
UPDATE thread
SET rating = (rating*voters - OLD.rating)/(voters-1),
voters = voters - 1
WHERE threadid = OLD.threadid;
END

You may find the following reading helpful:
An introduction to Triggers
Wikipedia: DB Triggers

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight