Mongoid ignores collection.insert if at least one duplicate exists - mongoid

I have a unique index setup, with drop_dup: true
index({ key1: 1, key2: 1 }, { unique: true, drop_dups: true })
When inserting multiple records, I would like the non-duplicates to succeed, similar to MySQL's
INSERT IGNORE INTO table ...
.. or even INSERT INTO table ... ON DUPLICATE KEY UPDATE
So if I have a record:
Model.collection.insert({'key1'=>'not-unique', 'key2'=>'boo'})
It seems that the following call doesn't do anything.
Model.collection.insert(
{'key1'=>'not-unique', 'key2'=>'boo'},
{'key1'=>'im-unique', 'key2'=>'me-too'}
)
Is there a way to at least insert {'key1'=>'im-unique', 'key2'=>'me-too'} on the 2nd call?
Thanks!

It appears that you are looking for insert batch with continue_on_error
and not single document update/upsert of which I think you are already informed.
http://api.mongodb.org/ruby/current/Mongo/Collection.html#insert-instance_method
With the 10gen driver, an equivalent of your example with a User model would be:
Gemfile
gem 'mongo'
gem 'bson_ext'
test/unit/user_test.rb (excerpt)
col = Mongo::Connection.new['sandbox_test']['users']
col.insert({'key1'=>'not-unique', 'key2'=>'boo'})
assert_equal(1, User.count)
col.insert([{'key1'=>'not-unique', 'key2'=>'boo'},{'key1'=>'im-unique', 'key2'=>'me-too'}], :continue_on_error => true)
assert_equal(2, User.count)
The above works in a simple unit test.
Mongoid 3 uses the Moped driver instead of the 10gen driver.
In the 1.1.6 gem that you are using, moped does not support options / flags,
but Durran added flags to the insert method circa July 28 in github.
https://github.com/mongoid/moped/commit/2c7d6cded23f64f436fd9e992ddc98bbb7bbdaae
https://github.com/mongoid/moped/commit/f8157b43ef0e13da85dbfcb7a6dbebfa1fc8735c
As of mongoid 3.0.3 with moped 1.2.0, the following insert batch with continue_on_error works.
Model.collection.insert([{'key1'=>'not-unique', 'key2'=>'boo'},{'key1'=>'im-unique', 'key2'=>'me-too'}], [:continue_on_error])
Note that the primary parameter to the insert method for batch insert is an Array object
enclosed in square brackets - the square brackets are missing from your post.
Hope that this helps.

Related

SQL Server - EF Core - Custom Constraint where all flags against id must be false except for one

I have a table generated through ef core code first and need to assign a constraint where all the flags in records with a specific primary key id must be false except for one which has to be true, so it will always have a single record that has the flag set to true.
for example in the picture below, I expect only one item to have isAccountHolder to be true and if there is only one record in the table then that must have isAccountHolder set to true.
I am wondering if this can be done using the code first approach. I have tried the following but it does not seem to work.
builder
.HasIndex(p => new {p.AccountHolderCustomerId, p.IsAccountHolder})
.HasFilter("[IsAccountHolder] = 1");
I found my answer here:
How can I stop EF Core from creating a filtered index on a nullable column
builder
.HasIndex(p => new {p.AccountHolderCustomerId, p.IsAccountHolder})
.IsUnique()
.HasFilter("[IsAccountHolder] = 1");

How to ensure developers filter by a foreign key in CakePHP

In a legacy project we had issues where if a developer would forget a project_id in the query condition, rows for all projects would be shown - instead of the single project they are meant to see. For example for "Comments":
comments [id, project_id, message ]
If you forget to filter by project_id you would see all projects. This is caught by tests, sometimes not, but I would rather do a prevention - the dev should see straightaway "WRONG/Empty"!
To get around this, the product manager is insisting on separate tables for comments, like this:
project1_comments [id,message]
project2_comments [id,message]
Here if you forgot the project/table name, if something were to still pass tests and got deployed, you would get nothing or an error.
However the difficulty is then with associated tables. Example "Files" linked to "Comments":
files [ id, comment_id, path ]
3, 1, files/foo/bar
project1_comments
id | message
1 | Hello World
project2_comments
id | message
1 | Bye World
This then turns into a database per project, which seems overkill.
Another possibility, how to add a Behaviour on the Comments model to ensure any find/select query does include the foreign key, eg - project_id?
Many thanks in advance.
In a legacy project we had issues where if a developer would forget a project_id in the query condition
CakePHP generates the join conditions based upon associations you define for the tables. They are automatic when you use contains and it's unlikely a developer would make such a mistake with CakePHP.
To get around this, the product manager is insisting on separate tables for comments, like this:
Don't do it. Seems like a really bad idea to me.
Another possibility, how to add a Behaviour on the Comments model to ensure any find/select query does include the foreign key, eg - project_id?
The easiest solution is to just forbid all direct queries on the Comments table.
class Comments extends Table {
public function find($type = 'all', $options = [])
{
throw new \Cake\Network\Exception\ForbiddenException('Comments can not be used directly');
}
}
Afterwards only Comments read via an association will be allowed (associations always have valid join conditions), but think twice before doing this as I don't see any benefits in such a restriction.
You can't easily restrict direct queries on Comments to only those that contain a product_id in the where clause. The problem is that where clauses are an expression tree, and you'd have to traverse the tree and check all different kinds of expressions. It's a pain.
What I would do is restrict Comments so that product_id has to be passed as an option to the finder.
$records = $Comments->find('all', ['product_id'=>$product_id])->all();
What the above does is pass $product_id as an option to the default findAll method of the table. We can than override that methods and force product_id as a required option for all direct comment queries.
public function findAll(Query $query, array $options)
{
$product_id = Hash::get($options, 'product_id');
if (!$product_id) {
throw new ForbiddenException('product_id is required');
}
return $query->where(['product_id' => $product_id]);
}
I don't see an easy way to do the above via a behavior, because the where clause contains only expressions by the time the behavior is executed.

Solr deletions with custom full import

I'm trying to use the DataImportHandler to keep my index in sync with a SQL database (what I would think is a pretty vanilla thing to do). Since my database will be pretty large, I want to use incremental imports using this method http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport so the calls are of the form http://localhost:8983/solr/Items/dataimport?command=full-import&clean=false. This works perfectly well for adding items.
I have a separate DeletedItems table in my database which contains the primary keys of the items that have been deleted from the Items table, along with when they were deleted. As part of the DataImport call, I had hoped to be able to delete the relevant items from my index based on a query along the lines of
SELECT Id FROM DeletedItems WHERE WasDeletedOn > '${dataimporter.last_index_time}'
but I can't figure out how to do this. The link above alludes to it with the cryptic
In this case it means obviously that in case you also want to use deletedPkQuery then when running the delta-import command is still necessary.
but setting deletedPkQuery to the above SQL query doesn't seem to work. I then read that deletedPkQuery only works with delta-imports, so I am forced to make two requests to my solr server as part of the sync process? This doesn't seem right as the operations are parameterized by the dataimporter.last_index_time property, which changes. Both steps would need to be done in one "atomic" action, surely? Any ideas?
You must use the import handler special commands
https://wiki.apache.org/solr/DataImportHandler#Special_Commands
With these commands you can alter the boost or delete a document coming from the recordset of the full import query. Be aware that you must use the $skipDoc field to avoid that the document gets indexed again and that you must repeat the id in the $deleteDocById field.
You can use a union query
select
id,
text,
'false' as [$deleteDocById],
'false' as [$skipDoc]
from [rows to update or add]
Union Select
id,
'' as text,
id as [$deleteDocById],
true as [$skipDoc]
or a case when
select
id,
text,
CASE
when deleted = 1 then id
else 'false'
END as [$deleteDocById],
CASE
when deleted = 1 then 'true'
else 'false'
END as [$skipDoc]
Where updated > ${dih.last_index_time}
The deletedPkQuery is run as part of the regular call to delta-import, so you don't have to run anything twice (and when doing a full-import, there's no need to run deletedPkQuery, since the whole connection is cleared before importing anyway).
The deletedPkQuery should be configured on the same element as your main query. Be sure to match the field names exactly as well, and that the id produced by your deletedPkQuery matches the one provided by the main query.
There's a minimal example on solr.pl for importing and deleting fields using the same deleted_entries-table structure as you have here:
<entity
name="album"
query="SELECT * from albums"
deletedPkQuery="SELECT deleted_id as id FROM deletes WHERE deleted_at > '${dataimporter.last_index_time}'"
>
Also make sure that the format of the deleted_at-field is comparable against the value produced by last_index_time. The default is yyyy-MM-dd HH:mm:ss.
.. and lastly, remember that the last_index_time property isn't available before the second time the task is run, since there's no "previous index time" the first time an index is being populated (but the deletedPkQuery shouldn't run before that anyway).

HiveQL to HBase

I am using Hive 0.14 and Hbase 0.98.8
I would like to use HiveQL for accessing a HBase "table".
I created a table with a complex composite rowkey:
CREATE EXTERNAL TABLE db.hive_hbase (rowkey struct<p1:string, p2:string, p3:string>, column1 string, column2 string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY ';'
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" =
":key,cf:c1,cf:c2")
TBLPROPERTIES("hbase.table.name"="hbase_table");
The table is getting successfully created, but the HiveQL is taking forever:
SELECT * from db.hive_hbase WHERE rowkey.p1 = 'xyz';
Queries without using the rowkey are fine and also using the hbase shell with filters are working.
I don't find anything in the logs, but I assume that there could be an issue with complex composite keys and performance.
Did anybody face the same issue? Hints to solve it? Other ideas, what I could try?
Thank you
Update 16.07.15:
I changed the log4j properties to 'DEBUG' and found some interesting information:
It says:
2015-07-15 15:56:41,232 INFO ppd.OpProcFactory (OpProcFactory.java:logExpr(823)) - Pushdown Predicates of FIL For Alias : hive_hbase
2015-07-15 15:56:41,232 INFO ppd.OpProcFactory (OpProcFactory.java:logExpr(826)) - (rowkey.p1 = 'xyz')
But some lines later:
2015-07-15 15:56:41,430 DEBUG ppd.OpProcFactory (OpProcFactory.java:pushFilterToStorageHandler(1051)) - No pushdown possible for predicate: (rowkey.p1 = 'xyz')
So my guess is: HiveQL over HBase does not do any predicate pushdown in Hbase but rather starts a MapReduce job.
Could there be a bug with the predicate pushdown?
I tried similar situation using Hive 0.13 and it works fine. I got the result. What version of hive are you working on?

Rails 3, ActiveRecord, PostgreSQL - ".uniq" command doesn't work?

I have following query:
Article.joins(:themes => [:users]).where(["articles.user_id != ?", current_user.id]).order("Random()").limit(15).uniq
and gives me the error
PG::Error: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
LINE 1: ...s"."user_id" WHERE (articles.user_id != 1) ORDER BY Random() L...
When I update the original query to
Article.joins(:themes => [:users]).where(["articles.user_id != ?", current_user.id]).order("Random()").limit(15)#.uniq
so the error is gone... In MySQL .uniq works, in PostgreSQL not. Exist any alternative?
As the error states for SELECT DISTINCT, ORDER BY expressions must appear in select list.
Therefore, you must explicitly select for the clause you are ordering by.
Here is an example, it is similar to your case but generalize a bit.
Article.select('articles.*, RANDOM()')
.joins(:users)
.where(:column => 'whatever')
.order('Random()')
.uniq
.limit(15)
So, explicitly include your ORDER BY clause (in this case RANDOM()) using .select(). As shown above, in order for your query to return the Article attributes, you must explicitly select them also.
I hope this helps; good luck
Just to enrich the thread with more examples, in case you have nested relations in the query, you can try with the following statement.
Person.find(params[:id]).cars.select('cars.*, lower(cars.name)').order("lower(cars.name) ASC")
In the given example, you're asking all the cars for a given person, ordered by model name (Audi, Ferrari, Porsche)
I don't think this is a better way, but may help to address this kind of situation thinking in objects and collections, instead of a relational (Database) way.
Thanks!
I assume that the .uniq method is translated to a DISTINCT clause on the SQL. PostgreSQL is picky (pickier than MySQL) -- all fields in the select list when using DISTINCT must be present in the ORDER_BY (and GROUP_BY) clauses.
It's a little unclear what you are attempting to do (a random ordering?). In addition to posting the full SQL sent, if you could explain your objective, that might be helpful in finding an alternative.
I just upgraded my 100% working and tested application from 3.1.1 to 3.2.7 and now have this same PG::Error.
I am using Cancan...
#users = User.accessible_by(current_ability).order('lname asc').uniq
Removing the .uniq solves the problem and it was not necessary anyway for this simple query.
Still looking through the change notes between 3.1.1 and 3.2.7 to see what caused this to break.

Resources