Is there a way in Solr to perform bulk updates without specifying it document by document?
In Solr we can update a field of a single record at a time, But in order to update the 1000 record it's gonna take more time . So any option is there to update a field of thousand indexes in a shot or in a one go ?
No, there is nothing similar to UPDATE foo SET field = "bar" - you'll have to either submit the complete set of updated documents, or a batches of atomic update commands (each related to a separate id).
[{"id":"mydoc", "price":{"set":99}},
{"id":"mydoc2", "price":{"set":199}}]
Related
I went over the documentation for Clickhouse and I did not see the option to UPDATE nor DELETE. It seems to me its an append only system.
Is there a possibility to update existing records or is there some workaround like truncating a partition that has records in it that have changed and then re-insering the entire data for that partition?
Through Alter query in clickhouse we can able to delete/update the rows in a table.
For delete: Query should be constructed as
ALTER TABLE testing.Employee DELETE WHERE Emp_Name='user4';
For Update: Query should be constructed as
ALTER TABLE testing.employee UPDATE AssignedUser='sunil' where AssignedUser='sunny';
UPDATE: This answer is no longer true, look at https://stackoverflow.com/a/55298764/3583139
ClickHouse doesn't support real UPDATE/DELETE.
But there are few possible workarounds:
Trying to organize data in a way, that is need not to be updated.
You could write log of update events to a table, and then calculate reports from that log. So, instead of updating existing records, you append new records to a table.
Using table engine that do data transformation in background during merges. For example, (rather specific) CollapsingMergeTree table engine:
https://clickhouse.yandex/reference_en.html#CollapsingMergeTree
Also there are ReplacingMergeTree table engine (not documented yet, you could find example in tests: https://github.com/yandex/ClickHouse/blob/master/dbms/tests/queries/0_stateless/00325_replacing_merge_tree.sql)
Drawback is that you don't know, when background merge will be done, and will it ever be done.
Also look at samdoj's answer.
You can drop and create new tables, but depending on their size this might be very time consuming. You could do something like this:
For deletion, something like this could work.
INSERT INTO tableTemp SELECT * from table1 WHERE rowID != #targetRowID;
DROP table1;
INSERT INTO table1 SELECT * from tableTemp;
Similarly, to update a row, you could first delete it in this manner, and then add it.
Functionality to UPDATE or DELETE data has been added in recent ClickHouse releases, but its expensive batch operation which can't be performed too frequently.
See https://clickhouse.yandex/docs/en/query_language/alter/#mutations for more details.
It's an old question, but updates are now supported in Clickhouse. Note it's not recommended to do many small changes for performance reasons. But it is possible.
Syntax:
ALTER TABLE [db.]table UPDATE column1 = expr1 [, ...] WHERE filter_expr
Clickhouse UPDATE documentation
I have a table that has several fields, 2 of these fields are "startdate" and "enddate", which mark the validity of the record. If i insert 1 new record, the new record cannot overlap with other records in terms of start date and end date.
Hence on insertion of new record i may need to adjust the value of "startdate" and "enddate" of pre-existing records so they don't overlap with the new record. Similarly, any preexisting records that have 100% overlap with the new record, will need to be deleted.
My table is an InnoDB table, which i know supports such transactions.
Are there any examples which show use of insert / update / delete using transactions (all must succeed in order for any one of them to succeed and be commited) ?
I don't know how to do this. Most examples only show the use of saveAssociated() which i'm not sure is capable of catering for delete operations?
Thanks
Kevin
Perhaps you could use the beforeSave callback to search for the preexisting records and delete them before saving your new record.
from the docs:
Place any pre-save logic in this function. This function executes immediately after model data has been successfully validated, but just before the data is saved. This function should also return true if you want the save operation to continue.
I think you're looking to do Transactions: http://book.cakephp.org/2.0/en/models/transactions.html
That should allow you to run your queries - you start a transaction, perform any required actions, and then commit or rollback based on the outcome. Although, given your description I'd think doing some reads and adjusting your data before committing anything might be a better approach. Either way, transactions aren't a bad idea though!
Hi I have accidently updated a row in SQL-SERVER that I should not have is there anyway to get the previous value of the row using this query:
UPDATE Documents
SET Name = 'Files'
WHERE Id = 950
Is there any way to recover the previous value?
Yes, it is possible, but only under certain circumstances.
If you had wrapped the UPDATE in a transaction, you could ROLLBACK. This would undo the UPDATE.
Assuming you didn't put it in a transaction, you need to reset the database to a previous point in time. This is only possible if you have some form of back-up on the database. How to do this is shown in this MSDN page
Not that both of these options will UNDO the update, not just tell you the previous values.
If we receive an update statement that does not check if the value has changed in the where clause, what are the different ways to ignore that update inside a trigger?
I know we can do a comparison of each individual field (handling the ISNULL side as well), but where it's a table that has 50+ fields, is there a faster/easier way to do it?
Note:I want to save each and every event in logs for updated fields.for example i have 50 fields and one of the field is updated(for single row not for entire table),then i want to save only that updated field old value and new value in logs.
Thanks in Advance, RAHUL
If this is more about logging changes to tables, a simpler solution may be to use Change Data Capture (CDC) tables.
Every time a change is made to a table, a row is written to your CDC table. Then you could write a query over the CDC table to bring you back just the data that has changed.
More information is on CDC tables is availble here:
http://msdn.microsoft.com/en-us/library/bb522489(v=sql.105).aspx
I am using DataImportHandler for indexing data in SOLR. I used full-import to index all the data in the my database which is around 10000 products.Now I am confused with the delta-import usage? Does it index the new data added into the database on interval basis i mean it is going to index the new data added to my table around 10 rows or it just updates the changes in the already indexed data.
Can anyone please explain it to me with simple example as soon as you can.
The DataImportHandler can be a little daunting. Your initial query has loaded 10.000 unique products. This is loaded if you specify /dataimport?command=full-import.
When this import is done, the DIH stores a variable ({dataimporter.last_index_time}) which is the last date/time you did this import.
In order to do an update, you specify a deltaQuery. The deltaQuery is meant to identify the records that have changed in your database since the last update. So, you specify a query like this: SELECT product_id
FROM sometable
WHERE [date_update] >= '${dataimporter.last_index_time}'
This will retrieve all the product_ids from your database that are updated since you last full update. The next query (deltaImportQuery) you need to specify is the query that will retrieve the full record for each product_id that you have from the previous step.
Assuming product_id is you unique key, solr will figure out that it needs to update an existing record, or add one if the product_id doens't work.
In order to execute the deltaQuery and the deltaImportQuery you use /dataimport?command=delta-import
This is a great simplification of all the possibilities, check the Solr wiki on DataImportHandler, it is a VERY powerful tool!
On another note:
When you use a delta import within a small time window (like a couple of times in a few seconds) and the database server is on an other machine than the solr index service, make sure that the systemtime of both machines matches, since the timestamp of [date_update] is generated on the database server and dataimporter.last_index_time is generated on the other.
Otherwise you won't be updating the index (or too much) depending on the time differences.
I agree that the Data Import Handler can handle this situation. One important limitation to the DIH is that it does not queue requests. The result of this is that if the DIH is "busy" indexing it will ignore all future DIH requests until it is "idle" again. The skipped DIH requests are lost and not executed.