Apache Camel mybatis commits after every insert, want it not to - apache-camel

I cannot seem to stop mybatis from commit() ing after ever insert, delete or update. I would like to control this myself
from( A )
(begin)
... do some processing ...
lots of
.to( "mybtis:insertX?statementType=insertList" )
(commit)
and have written some mappings to do that, but after every insertList above and delete, mybatis immediately commits. I'm unsure how to tell it to stop doing that, and am worried looking through the source that commit() is built into those functions unavoidably in mybatis DefaultSqlSession for example seems to have commit() hardwired in (though of course it might be a mybatis-session commit rather than a db-commit, but hard to tell).
I might be able to achieve a similar effect with staging tables and things, but it seems messy and I'd prefer to use a correct mechanism if one exists
I expected it to be configurable on the mybatis configuration URI.
Maybe I just dont understand mybatis well enough to know how to do this

you are correct that each call to .to( "mybtis:insertX?statementType=insertList") will commit automatically, but you can pass in a List that you want to commit all at once...
from( A )
(begin)
... do some processing ...
... aggregate into a List ...
.to( "mybtis:insertX?statementType=insertList" ) //commits once after entire List is inserted
see https://svn.apache.org/repos/asf/camel/trunk/components/camel-mybatis/src/test/java/org/apache/camel/component/mybatis/MyBatisInsertListTest.java

#boday hm, it seems I have lost option to 'comment'.
Yes, but this is the problem, I have such a large batch to insert that I need to split it into groups of 10,000 or mybatis breaks unfortuantely. I also do a table-delete before . I wish they left some kind of 'backdoor' for special things without a commit.
I have a feeling that I may need to use a staging table and do some kind of 'rename' to get the atomic effect I'm looking for sadly.
I'm going to have a look at getting the 'openSession' the other guy mentioned. It also occurs to me I could totally frig it with my own custom 'SqlSession' that overides the commit() method to just do nothing basically, and it wouldn't be so terrible because it would only be one method, but is obviously very non-ideal.

Related

Is it possible to do a result_scan in snowflake with the query nested

The standard flow to query the result of a query looks like this.
doc
describe table your_db.your_schema.your_table_name;
select
"name"
,"type"
,*
from table(result_scan(last_query_id())) ;
Ideally, you should be able to
select
"name"
,"type"
,*
from table(result_scan(describe table your_db.your_schema.your_table_name)) ;
Is something like this possible in reality or must I keep dreaming?
It is not possible for the following reasons
The doc clearly states that the RESULT_SCAN function only accept a query ID as an input argument.
RESULT_SCAN function scans the result of a previously executed query. Therefore that query must have occurred as a different transaction in the past
It's also worth adding to Clark's answer that it's not presently possible, because presently the DESCRIBE TABLE is not run on the COMPUTE layer, but only of the SHARED SERVICES layer (meta data, query planner, etc).
Originally back in my day, there wasn't even a RESULTS_SCAN, and everything was externally parsed. So the RESULT_SCAN allows you to run a COMPUTE layer against the a prior result set, and thus we have a two step process. Now compared to other DB's where everything is a full DB object, thus everything is accessible via SQL this is frustrating.
But I feel when you know how/why it's the way it is, the steps make sense, and like many things once you get a "working pattern" you just do it. Albeit, at first it is different, and now how one might like it to be.

Which solution is preferred when populating two Cassandra tables?

I have a performance question. I got two tables in Cassandra, both of them have exactly same structure. I need to save incoming data in both of them. The problem I have is what would be better solution to do this:
Create two repositories, both of them open Cassandra session, save data to both tables separately (all in code).
Save data to one table, have a trigger on this table and copy incoming data to another one
Any other solution?
I think first two are ok, but I am not sure if first one is good enough. Can someone explain it to me?
This sounds like a good use case for BATCH. Essentially, you can assemble two write statements and execute them in a BATCH to ensure atomicity. That should keep the two tables in-sync. Example below from the DataStax docs (URL).
cqlsh> BEGIN LOGGED BATCH
INSERT INTO cycling.cyclist_names (cyclist_name, race_id) VALUES ('Vera ADRIAN', 100);
INSERT INTO cycling.cyclist_by_id (race_id, cyclist_name) VALUES (100, 'Vera ADRIAN');
APPLY BATCH;
+1 to Aaron's response about using BATCH statements but the quoted example is specific to CQL. It's a bit more nuanced when implementing it in your app.
If you're using the Java driver, a typical INSERT statement would look like:
SimpleStatement simpleInsertUser = SimpleStatement.newInstance(
"INSERT INTO users (...) VALUES (?), "..." );
And here's a prepared statement:
PreparedStatement psInsertUserByMobile = session.prepare(
"INSERT INTO users_by_mobile (...) VALUES (...)" );
If you were to batch these 2 statements:
BatchStatement batch = BatchStatement.newInstance(
DefaultBatchType.LOGGED,
simpleInsertBalance,
preparedInsertExpense.bind(..., false) );
session.execute(batch);
For item 2 in your list, I don't know companies who use Cassandra TRIGGERs in production so it isn't something I would recommend. It was experimental for a while and I don't have enough experience to be able to recommend them for production.
For item 3, this is the use case that Materialized Views are trying to solve. They are certainly a lot simpler from a dev point-of-view since the table updates are done server-side instead of client-side.
They are OK to use if you don't have a lot of tables but be aware that the updates to the views happen asynchronously (not at the same time as when the mutations occur on the base table). With MVs, there's also a risk that when the views get so out-of-sync with the base table, the only solution is to drop and recreate the MV.
If you prefer not to use BATCH statements, just make sure you're fully aware of the tradeoffs with using MVs. If you're interested, I've explained it in a bit more detail in https://community.datastax.com/articles/2774/. Cheers!

How can I find out where a database table is being populated from?

I'm in charge of an Oracle database for which we don't have any documentation. At the moment I need to know how a table is getting populated.
How can I find out which procedure, trigger, or other source, this table is getting its data from?
Or even better, query the DBA_DEPENDENCIES table (or its equivalent USER_ ). You should see what objects are dependent on them and who owns them.
select owner, name, type, referenced_owner
from dba_dependencies
where referenced_name = 'YOUR_TABLE'
And yeah, you need to see through the objects to see whether there is an INSERT happening in.
Also this, from my comment above.
If it is not a production system, I would suggest you to raise an user
defined exception in TRIGGER- before INSERT with some custom message
or LOCK the table from INSERT and watch over the applications which
try inserting into them failing. But yeah, you might also get calls
from many angry people.
It is quite simple ;-)
SELECT * FROM USER_SOURCE WHERE UPPER(TEXT) LIKE '%NAME_OF_YOUR_TABLE%';
In output you'll have all procedures, functions, and so on, that in ther body invoke your table called NAME_OF_YOUR_TABLE.
NAME_OF_YOUR_TABLE has to be written UPPERCASE because we are using UPPER(TEXT) in order to retrieve results as Name_Of_Your_Table, NAME_of_YOUR_table, NaMe_Of_YoUr_TaBlE, and so on.
Another thought is to try querying v$sql to find a statement that performs the update. You may get something from the module/action (or in 10g progam_id and program_line#).
DML changes are recorded in *_TAB_MODIFICATIONS.
Without creating triggers you can use LOG MINER to find all data changes and from which session.
With a trigger you can record SYS_CONTEXT variables into a table.
http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions165.htm#SQLRF06117
Sounds like you want to audit.
How about
AUDIT ALL ON ::TABLE::;
Alternatively apply DBMS_FGA policy on the table and collect the client, program, user, and maybe the call stack would be available too.
Late to the party!
I second Gary's mention of v$sql also. That may yield the quick answer as long as the query hasn't been flushed.
If you know its in your current instance, I like a combination of what has been used above; if there is no dynamic SQL, xxx_Dependencies will work and work well.
Join that to xxx_Source to get that pesky dynamic SQL.
We are also bringing data into our dev instance using the SQL*Plus copy command (careful! deprecated!), but data can be introduced by imp or impdp as well. Check xxx_Directories for the directories blessed to bring data in/out.

Confirm before delete/update in SQL Management Studio?

So for the second day in a row, someone has wiped out an entire table of data as opposed to the one row they were trying to delete because they didn't have the qualified where clause.
I've been all up and down the mgmt studio options, but can't find a confirm option. I know other tools for other databases have it.
I'd suggest that you should always write SELECT statement with WHERE clause first and execute it to actually see what rows will your DELETE command delete. Then just execute DELETE with the same WHERE clause. The same applies for UPDATEs.
Under Tools>Options>Query Execution>SQL Server>ANSI, you can enable the Implicit Transactions option which means that you don't need to explicitly include the Begin Transaction command.
The obvious downside of this is that you might forget to add a Commit (or Rollback) at the end, or worse still, your colleagues will add Commit at the end of every script by default.
You can lead the horse to water...
You might suggest that they always take an ad-hoc backup before they do anything (depending on the size of your DB) just in case.
Try using a BEGIN TRANSACTION before you run your DELETE statement.
Then you can choose to COMMIT or ROLLBACK same.
In SSMS 2005, you can enable this option under Tools|Options|Query Execution|SQL Server|ANSI ... check SET IMPLICIT_TRANSACTIONS. That will require a commit to affect update/delete queries for future connections.
For the current query, go to Query|Query Options|Execution|ANSI and check the same box.
This page also has instructions for SSMS 2000, if that is what you're using.
As others have pointed out, this won't address the root cause: it's almost as easy to paste a COMMIT at the end of every new query you create as it is to fire off a query in the first place.
First, this is what audit tables are for. If you know who deleted all the records you can either restrict their database privileges or deal with them from a performance perspective. The last person who did this at my office is currently on probation. If she does it again, she will be let go. You have responsibilites if you have access to production data and ensuring that you cause no harm is one of them. This is a performance problem as much as a technical problem. You will never find a way to prevent people from making dumb mistakes (the database has no way to know if you meant delete table a or delete table a where id = 100 and a confirm will get hit automatically by most people). You can only try to reduce them by making sure the people who run this code are responsible and by putting into place policies to help them remember what to do. Employees who have a pattern of behaving irresponsibly with your busness data (particulaly after they have been given a warning) should be fired.
Others have suggested the kinds of things we do to prevent this from happening. I always embed a select in a delete that I'm running from a query window to make sure it will delete only the records I intend. All our code on production that changes, inserts or deletes data must be enclosed in a transaction. If it is being run manually, you don't run the rollback or commit until you see the number of records affected.
Example of delete with embedded select
delete a
--select a.* from
from table1 a
join table 2 b on a.id = b.id
where b.somefield = 'test'
But even these techniques can't prevent all human error. A developer who doesn't understand the data may run the select and still not understand that it is deleting too many records. Running in a transaction may mean you have other problems when people forget to commit or rollback and lock up the system. Or people may put it in a transaction and still hit commit without thinking just as they would hit confirm on a message box if there was one. The best prevention is to have a way to quickly recover from errors like these. Recovery from an audit log table tends to be faster than from backups. Plus you have the advantage of being able to tell who made the error and exactly which records were affected (maybe you didn't delete the whole table but your where clause was wrong and you deleted a few wrong records.)
For the most part, production data should not be changed on the fly. You should script the change and check it on dev first. Then on prod, all you have to do is run the script with no changes rather than highlighting and running little pieces one at a time. Now inthe real world this isn't always possible as sometimes you are fixing something broken only on prod that needs to be fixed now (for instance when none of your customers can log in because critical data got deleted). In a case like this, you may not have the luxury of reproducing the problem first on dev and then writing the fix. When you have these types of problems, you may need to fix directly on prod and you should have only dbas or database analysts, or configuration managers or others who are normally responsible for data on the prod do the fix not a developer. Developers in general should not have access to prod.
That is why I believe you should always:
1 Use stored procedures that are tested on a dev database before deploying to production
2 Select the data before deletion
3 Screen developers using an interview and performance evaluation process :)
4 Base performance evaluation on how many database tables they do/do not delete
5 Treat production data as if it were poisonous and be very afraid
So for the second day in a row, someone has wiped out an entire table of data as opposed to the one row they were trying to delete because they didn't have the qualified where clause
Probably the only solution will be to replace someone with someone else ;). Otherwise they will always find their workaround
Eventually restrict the database access for that person and provide them with the stored procedure that takes the parameter used in the where clause and grant them access to execute that stored procedure.
Put on your best Trogdor and Burninate until they learn to put in the WHERE clause.
The best advice is to get the muckety-mucks that are mucking around in the database to use transactions when testing. It goes a long way towards preventing "whoops" moments. The caveat is that now you have to tell them to COMMIT or ROLLBACK because for sure they're going to lock up your DB at least once.
Lock it down:
REVOKE delete rights on all your tables.
Put in an audit trigger and audit table.
Create parametrized delete SPs and only give rights to execute on an as needed basis.
Isn't there a way to give users the results they need without providing raw access to SQL? If you at least had a separate entry box for "WHERE", you could default it to "WHERE 1 = 0" or something.
I think there must be a way to back these out of the transaction journaling, too. But probably not without rolling everything back, and then selectively reapplying whatever came after the fatal mistake.
Another ugly option is to create a trigger to write all DELETEs (maybe over some minimum number of records) to a log table.

What's your #1 way to be careful with a live database? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
For my customer I occasionally do work in their live database in order to fix a problem they have created for themselves, or in order to fix bad data that my product's bugs created. Much like Unix root access, it's just dangerous. What lessons should I learn ahead of time?
What is the #1 thing you do to be careful about operating on live data?
BEGIN TRANSACTION;
That way you can rollback after a mistake.
Three things I've learned the hard way over the years...
First, if you're doing updates or deletes on live data, first write a SELECT query with the WHERE clause you'll be using. Make sure it works. Make sure it's correct. Then prepend the UPDATE/DELETE statement to the known working WHERE clause.
You never want to have
DELETE FROM Customers
sitting in your query analyzer waiting for you to write the WHERE clause... accidentally hit "execute" and you've just killed your Customer table. Oops.
Also, depending on your platform, find out how to take a quick'n'dirty backup of a table. In SQL Server 2005,
SELECT *
INTO CustomerBackup200810032034
FROM Customer
will copy every row from the entire Customer table into a new table called CustomerBackup200810032034, which you can then delete once you've done your updates and made sure everything's OK. If the worst happens, it's a lot easier to restore missing data from this table than to try and restore last night's backup from disk or tape.
Finally, be wary of cascade deletes getting rid of stuff you didn't intend to delete - check your tables' relationships and key constraints before modifying anything.
Do a backup first: it should be the number 1 law of sysadmining anyways
EDIT: incorporating what others have said, make sure your UPDATES have appropriate WHERE clauses.
Ideally, changing a live database should never happen (beyond INSERTs and basic maintenance). Changing the live DB's structure is especially fraught with potential bad karma.
Make your changes to a copy, and when you're satisfied, then apply the fix to live.
Often before I do an UPDATE or DELETE, I write the equivalent SELECT.
NEVER do an update unless you are in a BEGIN TRAN t1--not in a dev database, not in production, not anywhere. NEVER run a COMMIT TRAN t1 outside a comment--always type
--COMMIT TRAN t1
and then select the statement in order to run it. (Obviously, this only applies to GUI query clients.) If you do these things, it will become second nature to do them and you won't lose hardly any time.
I actually have a "update" macro that types this. I always paste this in to set up my updates. You can make a similar one for deletes and inserts.
begin tran t1
update
set
where
rollback tran t1
--commit tran t1
Always make sure your UPDATEs and DELETEs have the proper WHERE clause.
To answer my own question:
When writing an update statement, write it out of order.
Write UPDATE [table-name]
Write WHERE [conditions]
Go back and write SET [columns-and-values]
Choosing the rows you want to update before you say what values you want to change is much safer than doing it in the other order. It makes it impossible for update person set email = 'bob#bob.com' to be sitting in your query window, ready to be run by a misplaced keystroke, ready to mess up every row in the table.
Edit: As others have said, write the WHERE clause for your deletes before you write DELETE.
As an example, I create SQL like this
--Update P Set
--Select ID, Name as OldName,
Name='Jones'
From Person P
Where ID = 1000
I highlight the text from the end up to the Select and run that SQL. Once I verify that it is pulling the record I want to update, I hit shift-up to hightlight the Update statement and run that.
Note that I used an alias. I never update a table name explicity. I always use an alias.
If I do this in conjunction with transactions and rollback/commits, I am really, really safe.
My #1 way to be careful with a live database? Don't touch it. :)
Backups can undo damage that you inflict on the database, but you're still likely to introduce negative side effects during that span of time.
No matter how solid you think the script you're working with is, run it through a test cycle. Even if a "test cycle" means running the script against your own instance of the database, make sure you do it. It's much better to introduce defects on your local box than a production environment.
Check, recheck, and check again any statment that is doing updates. Even if you think you're just doing a simple, single column update, sooner or later you will not have enough coffee and forget a 'where' clause, nuking a whole table.
A couple other things I've found helpful:
if using MySQL, enable Safe updates
If you have a DBA, ask them to do it.
I 've found these 3 things have kept me from doing any serious harm.
Nobody wants backup but everyone cries for recovery
Create your DB with foreign key references, because you should:
make it as hard as possible for yourself to update/delete data and destroying the structural integrity / something else with that
If possible, run on a system where you have to commit the changes before you permanently store them (i.e. deactivate autocommit while repairing the db)
Try to identify your problem's classes so that you get an understanding how to fix without trouble
Get a routine in playing backups into a database, always have a second database on a test server at hand so you can just work on that
Because remember: If something fails totally, you need to be up and running again as fast as any possible
Well, that's about all I can think of now. Take the bold passages and you see whats #1 for me. ;-)
Maybe consider not using any deletes or drops at all. Or maybe reduce the user permissions so that only a special DB user can delete/drop things.
If you're using Oracle or another database that supports it, verify your changes before doing a COMMIT.
Data should always be deployed to live via scripts, which can be rehearsed as many times as it is required to get it right on dev. When there's dependent data for the script to run correctly on dev, stage it appropriately -- you can not get away with this step if you truly want to be careful.
Check twice, commit once!
Backup or dump the database before starting.
To add on to what #Wayne said, write your WHERE before the table name in a DELETE or UPDATE statement.
BACK UP YOUR DATA. Learned that one the hard way working with customer databases on a regular basis.
Always add a using clause.
My rule (as an app developer): Don't touch it! That's what the trained DBAs are for. Heck, I don't even want permission to touch it. :)
Different colors per environment: We've setup our PL\SQL developer (IDE for Oracle) so that when you logon to the production DB all the windows are in bright red. Some have gone as far as assigning a different color for dev and test as well.
Make sure you specify a where clause when deleting records.
always test any queries beyond select on development data first to ensure it has the correct impact.
if possible, ask to pair with someone
always count to 3 before pressing Enter (if alone, as this will infuriate your pair partner!)
If I'm updating a database with a script, I always make sure I put a breakpoint or two at the start of my script, just in case I hit the run/execute by accident.
I'll add to recommendations of doing BEGIN TRAN before your UPDATE, just don't forget to actually do the COMMIT; you can do just as much damage if you leave your uncommitted transaction open. Don't get distracted by phones, co-workers, lunch etc when in the middle of updates or you'll find everyone else is locked up until you COMMIT or ROLLBACK.
I always comment out any destructive queries (insert, update, delete, drop, alter) when writing out adhoc queries in Query Analyzer. That way, the only way to run them, is to highlight them, without selecting the commented part, and press F5.
I also think it's a good idea, as already mentioned, to write your where statement first, with a select, and ensure that you are altering the right data.
Always back up before changing.
Always make mods (eg. ALTER TABLE) via a script.
Always modify data (eg. DELETE) via a stored procedure.
Create a read only user (or get the DBA to do it) and only use that user to look at the DB. Add the appropriate permissions to schema so that you can view the content of stored procedures/views/triggers/etc. but not have the ability to change them.

Resources