I'm trying to sort multiple records for a model based on a field and store their ranks in DB. Like below:
$instances = Model::orderBy('field')->get();
$rank = 1;
foreach ($instances as $instance) {
$instance->update([
'rank' => $rank,
]);
$rank++;
}
I have two questions:
1- Is there any alternative ways to avoid using loop? for example I put the ranks in an array and update the whole records by only one magic method. For example:
$instances = Model::orderBy('field')->get();
$rank = 1;
$ranks_array = array();
foreach ($instances as $instance) {
array_push($ranks_array, $rank);
$rank++;
}
$instances->magicMethod($ranks_array);
2- Is it necessary at all to do so? are the loops have heavy effects on the server or not? need to say that the number of records I'm going to update may not exceed 50 at all.
For insert queries, inserting all records at once will go much faster than inserting them one by one. However for update queries, if you need to update specific rows with specific data, there is no other way than to update them one by one.
I recently came across a similar issue where I needed to update 90k+ row from my DB.
Because I needed to add specific values to each column I needed to individually update each column.
What I found was instead of doing
$DBModel = Model::get();
foreach ($DBModel as $row) {
$row->notify = $row->paid;
// the date is calculated from a specific carbon date from another column
// I did not include the logic here for the example
$row->due = "0000-00-00 00:00:00";
$row->save();
}
Running the previous query took 5m33s
But doing it like this
$DBModel = Model::get();
DB::beginTransaction();
foreach ($DBModel as $row) {
DB::update(update TABLE_NAME set due = ?, notify = ? where id = ?",
["0000-00-00 00:00:00", $row->paid, $row->id]
);
}
DB::commit();
The previous query took only 1m54s to execute.
Related
In Dapper-Plus, is there a way to return the number of rows affected in the database? This is my code:
using (SqlConnection connection = new SqlConnection(Environment.GetEnvironmentVariable("sqldb_connection")))
{
connection.BulkInsert(myList);
}
I see you can do it for inserting a single row, but can't find functionality on the dapper plus bulk insert.
Since Dapper Plus allow to chain multiple methods, the method doesn't directly return this value.
However, you can do it with the following code:
var resultInfo = new Z.BulkOperations.ResultInfo();
connection.UseBulkOptions(options => {
options.UseRowsAffected = true;
options.ResultInfo = resultInfo;
}).BulkInsert(orders);
// Show RowsAffected
Console.WriteLine("Rows Inserted: " + resultInfo.RowsAffectedInserted);
Console.WriteLine("Rows Affected: " + resultInfo.RowsAffected);
Fiddle: https://dotnetfiddle.net/mOMNng
Keep in mind that using that option will slightly make the bulk operations slower.
EDIT: Answer comment
will it make it as slow as using the regular dapper insert method or is this way still faster?
It will still be way faster than regular Insert.
To give a simplified example:
I have a database with one table: names, which has 1 million records each containing a common boy or girl's name, and more added every day.
I have an application server that takes as input an http request from parents using my website 'Name Chooser' . With each request, I need to pick up a name from the db and return it, and then NOT give that name to another parent. The server is concurrent so can handle a high volume of requests, and yet have to respect "unique name per request" and still be high available.
What are the major components and strategies for an architecture of this use case?
From what I understand, you have two operations: Adding a name and Choosing a name.
I have couple of questions:
Qustion 1: Do parents choose names only or do they also add names?
Question 2 If they add names, doest that mean that when a name is added it should also be marked as already chosen?
Assuming that you don't want to make all name selection requests to wait for one another (by locking of queueing them):
One solution to resolve concurrency in case of choosing a name only is to use Optimistic offline lock.
The most common implementation to this is to add a version field to your table and increment this version when you mark a name as chosen. You will need DB support for this, but most databases offer a mechanism for this. MongoDB adds a version field to the documents by default. For a RDBMS (like SQL) you have to add this field yourself.
You havent specified what technology you are using, so I will give an example using pseudo code for an SQL DB. For MongoDB you can check how the DB makes these checks for you.
NameRecord {
id,
name,
parentID,
version,
isChosen,
function chooseForParent(parentID) {
if(this.isChosen){
throw Error/Exception;
}
this.parentID = parentID
this.isChosen = true;
this.version++;
}
}
NameRecordRepository {
function getByName(name) { ... }
function save(record) {
var oldVersion = record.version - 1;
var query = "UPDATE records SET .....
WHERE id = {record.id} AND version = {oldVersion}";
var rowsCount = db.execute(query);
if(rowsCount == 0) {
throw ConcurrencyViolation
}
}
}
// somewhere else in an object or module or whatever...
function chooseName(parentID, name) {
var record = NameRecordRepository.getByName(name);
record.chooseForParent(parentID);
NameRecordRepository.save(record);
}
Before whis object is saved to the DB a version comparison must be performed. SQL provides a way to execute a query based on some condition and return the row count of affected rows. In our case we check if the version in the Database is the same as the old one before update. If it's not, that means that someone else has updated the record.
In this simple case you can even remove the version field and use the isChosen flag in your SQL query like this:
var query = "UPDATE records SET .....
WHERE id = {record.id} AND isChosend = false";
When adding a new name to the database you will need a Unique constrant that will solve concurrenty issues.
Here's my situation: we have master tables with relationships to attribute tables. Sometimes, we fetch a row all by itself:
my $row = $rs->search({ some_key => 'some_value' })->first;
and sometimes we join one or more tables:
my $row = $rs->search({ some_key => 'some_value' }, { join => 'attributes' });
We have "helper" methods that look up specific attributes:
sub get_x_attr {
my $obj = shift;
my $x_attr = $obj->attributes->search({ attribute_name => 'x' })->one_row;
return $x_attr ? $x_attr->attribute_value : 'default';
}
This seems to issue another query, and while it's a pretty low-impact query, these add up when you are doing that zillions of times a day.
Now, if the row was joined originally, I could write the helper as:
my #attrs = grep { $_->attribute_name eq 'x' } $obj->attributes->all;
my $x_attribute = $attrs[0] || return 'default';
# etc.
and there'd be no additional query.
Here's my question: is there a safe, reliable way to interrogate "$obj" to see if it's got attributes pre-fetched? And further, is there any way to tell after the fact if the join was conditional (e.g., 'WHERE attribute_name = 'some_other_value', which would make $obj->attributes rather useless here)?
(I did some digging, and found that $obj->{internals}{related_resultsets} has the answer to the first question, but since it's not part of the exposed API, I'm very much opposed to using it this way.)
Use the relationship methods, if the relationship is prefetched the cached related row is used, if not a query is automatically executed.
Note that joining won't populate the cache, only prefetch does.
Search will always issue a query and never restrict the resultset via Perl code like your grep example.
I have a large asp.net mvc application that runs on a database that is rapidly growing in size. When the database is empty, everything works quickly, but one of my tables now has 350K records in it and an insert is now taking 15s. Here is a snippet:
foreach (var packageSheet in packageSheets)
{
// Create OrderSheets
var orderSheet = new OrderSheet { Sheet = packageSheet.Sheet };
// Add Default Options
orderSheet.AddDefaultOptions();
orderSheet.OrderPrints.Add(
new OrderPrint
{
OrderPose = CurrentOrderSubject.OrderPoses.Where(op => op.Id == orderPoseId).Single(),
PrintId = packageSheet.Sheet.Prints.First().Id
});
// Create OrderPackageSheets and add it to the order package held in the session
var orderPackageSheet =
new OrderPackageSheet
{
OrderSheet = orderSheet,
PackageSheet = packageSheet
};
_orderPackageRepository.SaveChanges();
...
}
When I SaveChanges at this point it takes 15s the on the first loop; each iteration after is fast. I have indexed the tables in question so I believe the database is tuned properly. It's the OrderPackageSheets table that contains 350K rows.
Can anyone tell me how I can optimize this to get rid of the delay?
Thank you!
EF can be slow if you are inserting a lot of rows at same time.
context.Configuration.AutoDetectChangesEnabled = false; wont do too much for you if this is really web app
You need to share your table definition and for instance you can use Simple recovery model which will improve insert performances.
Or, as mentioned, if you need to insert a lot of rows use bulk inserts
If the number of records is too high ,You can use stored procedure instead of EF.
If you need to use EF itself ,Disable auto updating of the context using
context.Configuration.AutoDetectChangesEnabled = false;
and save the context after the loop
Check these links
Efficient way to do bulk insert/update with Entity Framework
http://weblog.west-wind.com/posts/2013/Dec/22/Entity-Framework-and-slow-bulk-INSERTs
I wrote the code below in cakephp for and updateAll query like
$this->loadModel('User');
$this->User->updateAll(array('stauts'=>'active'),array());
The above code's equivalent SQL query is generated like this
UPDATE User SET status='active' WHERE 0 = 1;
When I write updateAll in cakephp like below
$this->loadModel('User');
$this->User->updateAll(array('stauts'=>'active'));
This code's equivalent SQL query is generated like this
UPDATE User SET status='active';
I don't know why this happens.
If you do not understand my question let me know in comments, I'll explain in shortly.
It's a safety catch
Conditions are often dynamic based on user input. Consider a controller action like so:
function enableAll() {
$conditions = array();
...
if (whatever) {
// Update only today's records
$conditions['created > '] = $yesterday;
}
if ($this->Auth->user()) {
// Update only my records
$conditions['user_id'] = $this->Auth->user('id');
}
$this->Widget->updateAll(
array('active' => 1),
$conditions
);
}
Logically conditions can be one of two things:
An array matching some or no records
An empty array
When it's an empty array, did the developer mean to update all records, or no records?
CakePHP can't know for sure, but if passed, an empty conditions array is more likely to be an error where the intention was to update nothing. Therefore to protect developers from accidentally updating everything, a condition is used which won't match any records (WHERE 0 = 1 is false - it will match no rows, always.).
That's why this:
// I definitely want to update the whole table
$model->updateAll($update);
is treated differently than this:
// mistake? maybe the conditions have been forgotten...
$model->updateAll($update, array());