what is equivalent of Oracle's Bulk Collect in DB2 - database

I have to return result set from Db2 to java program. result set have millions of rows. Is there something like bulk fetch of oracle in DB2.
Thanks in advance.

Db2 supports mult-row fetch for scrollable cursors. Check this link.
However, with millions of rows, it still may be inefficient compared to alternative approaches. Consider doing unload/export (depending on your Db2-server operating-system), and depending on what you want to do with the result-set, and how often you want to do this action.

Related

Which solution is preferred when populating two Cassandra tables?

I have a performance question. I got two tables in Cassandra, both of them have exactly same structure. I need to save incoming data in both of them. The problem I have is what would be better solution to do this:
Create two repositories, both of them open Cassandra session, save data to both tables separately (all in code).
Save data to one table, have a trigger on this table and copy incoming data to another one
Any other solution?
I think first two are ok, but I am not sure if first one is good enough. Can someone explain it to me?
This sounds like a good use case for BATCH. Essentially, you can assemble two write statements and execute them in a BATCH to ensure atomicity. That should keep the two tables in-sync. Example below from the DataStax docs (URL).
cqlsh> BEGIN LOGGED BATCH
INSERT INTO cycling.cyclist_names (cyclist_name, race_id) VALUES ('Vera ADRIAN', 100);
INSERT INTO cycling.cyclist_by_id (race_id, cyclist_name) VALUES (100, 'Vera ADRIAN');
APPLY BATCH;
+1 to Aaron's response about using BATCH statements but the quoted example is specific to CQL. It's a bit more nuanced when implementing it in your app.
If you're using the Java driver, a typical INSERT statement would look like:
SimpleStatement simpleInsertUser = SimpleStatement.newInstance(
"INSERT INTO users (...) VALUES (?), "..." );
And here's a prepared statement:
PreparedStatement psInsertUserByMobile = session.prepare(
"INSERT INTO users_by_mobile (...) VALUES (...)" );
If you were to batch these 2 statements:
BatchStatement batch = BatchStatement.newInstance(
DefaultBatchType.LOGGED,
simpleInsertBalance,
preparedInsertExpense.bind(..., false) );
session.execute(batch);
For item 2 in your list, I don't know companies who use Cassandra TRIGGERs in production so it isn't something I would recommend. It was experimental for a while and I don't have enough experience to be able to recommend them for production.
For item 3, this is the use case that Materialized Views are trying to solve. They are certainly a lot simpler from a dev point-of-view since the table updates are done server-side instead of client-side.
They are OK to use if you don't have a lot of tables but be aware that the updates to the views happen asynchronously (not at the same time as when the mutations occur on the base table). With MVs, there's also a risk that when the views get so out-of-sync with the base table, the only solution is to drop and recreate the MV.
If you prefer not to use BATCH statements, just make sure you're fully aware of the tradeoffs with using MVs. If you're interested, I've explained it in a bit more detail in https://community.datastax.com/articles/2774/. Cheers!

Select statement for over 500k records

I'm using this SELECT Statment
SELECT ID, Code, ParentID,...
FROM myTable WITH (NOLOCK)
WHERE ParentID = 0x0
This Statment is repeated each 15min (Through Windows Service)
The problem is the database become slow to other users when this query is runnig.
What is the best way to avoid slow performance while query is running?
Generate an execution plan for your query and inspect it.
Is the ParentId field indexed?
Are there other ways you might optimize the query?
Is it possible to increase the performance of the server that is hosting SQL Server?
Does it need more disk or RAM?
Do you have separate drives (spindles) for operating system, data, transaction logs, temporary databases?
Something else to consider - must you always retrieve the very latest values from this table for your application, or might it be possible to cache prior results and use those for some length of time?
Seems your table got huge number of records. You can think of implementing page-wise retrieval of data. You can first request for say TOP 100 rows and then having multiple calls to fetch rest of data.
I still don't understand need to run such query every 15 mins. You may think of implementing a stored procedure which can perform majority of processing and return you a small subset of data. This will be good improvement if it suits your requirement.

Optimal insert/update batch for SQL Server

I'm making frequent inserts and updates in large batches from c# code and I need to do it as fast as possible, please help me find all ways to speed up this process.
Build command text using StringBuilder, separate statements with ;
Don't use String.Format or StringBuilder.AppendFormat, it's slower then multiple StringBuilder.Append calls
Reuse SqlCommand and SqlConnection
Don't use SqlParameters (limits batch size)
Use insert into table values(..),values(..),values(..) syntax (1000 rows per statement)
Use as few indexes and constraints as possible
Use simple recovery model if possible
?
Here are questions to help update the list above
What is optimal number of statements per command (per one ExecuteNonQuery() call)?
Is it good to have inserts and updates in the same batch, or it is better to execute them separately?
My data is being received by tcp, so please don't suggest any Bulk Insert commands that involve reading data from file or external table.
Insert/Update statements rate is about 10/3.
Use table-valued parameters. They can scale really well when using large numbers of rows, and you can get performance that approaches BCP level. I blogged about a way of making that process pretty simple from the C# side here. Everything else you need to know is on the MSDN site here. You will get far better performance doing things this way rather than making little tweaks around normal SQL batches.
As of SQLServer2008 TableParameters are the way to go. See this article (step four)
http://www.altdevblogaday.com/2012/05/16/sql-server-high-performance-inserts/
I combined this with parallelizing the insertion process. Think that helped also, but would have to check ;-)
Use SqlBulkCopy into a temp table and then use the MERGE SQL command to merge the data.

What's the best way to convert one Oracle table (data) to fill a slightly different Oracle table?

I have two Oracle tables, an old one and a new one.
The old one was poorly designed (more so than mine, mind you) but there is a lot of current data that needs to be migrated into the new table that I created.
The new table has new columns, different columns.
I thought of just writing a PHP script or something with a whole bunch of string replacement... clearly that's a stupid way to do it though.
I would really like to be able to clean up the data a bit along the way as well. Some it was stored with markup in it (ex: "First Name"), lots of blank space, etc, so I would really like to fix all that before putting it into the new table.
Does anyone have any experience doing something like this? What should I do?
Thanks :)
I do this quite a bit - you can migrate with simple select statememt:
create table newtable as select
field1,
trim(oldfield2) as field3,
cast(field3 as number(6)) as field4,
(select pk from lookuptable where value = field5) as field5,
etc,
from
oldtable
There's really very little you could do with an intermediate language like php, etc that you can't do in native SQL when it comes to cleaning and transforming data.
For more complex cleanup, you can always create a sql function that does the heavy lifting, but I have cleaned up some pretty horrible data without resorting to that. Don't forget in oracle you have decode, case statements, etc.
I'd checkout an ETL tool like Pentaho Kettle. You'll be able to query the data from the old table, transform and clean it up, and re-insert it into the new table, all with a nice WYSIWYG tool.
Here's a previous question i answered regarding data migration and manipulation with Kettle.
Using Pentaho Kettle, how do I load multiple tables from a single table while keeping referential integrity?
If the data volumes aren't massive and if you are only going to do this once, then it will be hard to beat a roll-it-yourself program. Especially if you have some custom logic you need implemented.
The time taken to download, learn & use a tool (such as pentaho etc.) will probably not worth your while.
Coding a select *, updating columns in memory & doing an insert into will be quickly done in PHP or any other programming language.
That being said, if you find yourself doing this often, then an ETL tool might be worth learning.
I'm working on a similar project myself - migrating data from one model containing a couple of dozen tables to a somewhat different model of similar number of tables.
I've taken the approach of creating a MERGE statement for each target table. The source query gets all the data it needs, formats it as required, then the merge works out if the row already exists and updates/inserts as required. This way, I can run the statement multiple times as I develop the solution.
Depends on how complex the conversion process is. If it is easy enough to express in a single SQL statement, you're all set; just create the SELECT statement and then do the CREATE TABLE / INSERT statement. However, if you need to perform some complex transformation or (shudder) split or merge any of the rows to convert them properly, you should use a pipelined table function. It doesn't sound like that is the case, though; try to stick to the single statement as the other Chris suggested above. You definitely do not want to pull the data out of the database to do the transform as the transfer in and out of Oracle will always be slower than keeping it all in the database.
A couple more tips:
If the table already exists and you are doing an INSERT...SELECT statement, use the /*+ APPEND */ hint on the insert so that you are doing a bulk operation. Note that CREATE TABLE does this by default (as long as it's possible; you cannot perform bulk ops under certain conditions, e.g. if the new table is an index-organized table, has triggers, etc.
If you are on 10.2 or later, you should also consider using the LOG ERRORS INTO clause to log rejected records to an error table. That way, you won't lose the whole operation if one record has an error you didn't expect.

Long query prevents inserts

I have a query that runs each night on a table with a bunch of records (200,000+). This application simply iterates over the results (using a DbDataReader in a C# app if that's relevant) and processes each one. The processing is done outside of the database altogether. During the time that the application is iterating over the results I am unable to insert any records into the table that I am querying for. The insert statements just hang and eventually timeout. The inserts are done in completely separate applications.
Does SQL Server lock the table down while a query is being done? This seems like an overly aggressive locking policy. I could understand how there could be a conflict between the query and newly inserted records, but I would be perfectly ok if records inserted after the query started were simply not included in the results.
Any ways to avoid this?
Update:
The WITH (NOLOCK) definitely did the trick. As some of you pointed out, this isn't the cleanest approach. I can't really query everything into memory given the amount of records and some of the columns in this table are binary (some records are actually about 1MB of total data).
The other suggestion, was to query for batches of records at a time. This isn't a bad idea either, but it does bring up a new issue: database independent queries. Right now the application can work with a variety of different databases (Oracle, MySQL, Access, etc). Each database has their own way of limiting the rows returned in a query. But maybe this is better saved for another question?
Back on topic, the "WITH (NOLOCK)" clause is certainly SQL Server specific, is there any way to keep this out of my query (and thus preventing it from working with other databases)? Maybe I could somehow specify a parameter on the DbCommand object? Or can I specify the locking policy at the database level? That is, change some properties in SQL Server itself that will prevent the table from locking like this by default?
If you're using SQL Server 2005+, then how about giving the new MVCC snapshot isolation a try. I've had good results with it:
ALTER DATABASE SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
ALTER DATABASE SET READ_COMMITTED_SNAPSHOT ON;
ALTER DATABASE SET MULTI_USER;
It will stop readers blocking writers and vice-versa. It eliminates many deadlocks, at very little cost.
It depends what Isolation Level you are using. You might try doing your selects using the With (NoLock) hint, that will prevent the read locks, but will also mean the data being read might change before the selecting transaction completes.
The first thing you could do is try to add the "WITH (NOLOCK)" to any tables you have in your query. This will "Tame down" the locking that SQL Server does. An example of using "NOLOCK" on a join is as follows...
SELECT COUNT(Users.UserID)
FROM Users WITH (NOLOCK)
JOIN UsersInUserGroups WITH (NOLOCK) ON
Users.UserID = UsersInUserGroups.UserID
Another option is to use a dataset instead of a datareader. A datareader is a "fire hose" technique that stays connected to the tables while your program is processing and basically handling the table row by row through the hose. A dataset uses a "disconnected" methodology where all the data is loaded into memory and then the connection is closed. Your program can then loop the data in memory without having to worry about locking. However, if this is a really large amount of data, there maybe memory issues.
Hope this helps.
If you add the WITH (NOLOCK) hint after a table name in the FROM clause it should make sure it doesn't lock, and it doesn't care about reading data that is locked. You might get "out of date" results if you are writing at the same time, but if you don't care about that then you should be fine.
I reckon your best way of avoiding this is to do it in SQL rather than in the application.
You can add a
WAITFOR DELAY '000:00:01'
at the end of each loop iteration to provide time for other processes to run - just make sure that you haven't initiated a TRANSACTION such that all other processes are locked out anyway
The query is performing a table lock, thus the inserts are failing.
It sounds to me like you're keeping a lock on the table while processing the results.
You should instead load them into an array or collection of some sort, and close the database connection.
Then process the array.
In addition, while you're doing your select use either:
WITH(NOLOCK) or WITH(READPAST)
I'm not a big fan of using lock hints as you could end up with dirty reads or other weirdness. A couple of other ideas:
Can you break the number of rows down so you don't grab 200k at a time? Is there a way to tell whether you've processed a row - a flag, a timestamp - you could use to make the query? Your query could be 'SELECT TOP 5000 ...' getting a differnet 5k each time. Shorter queries mean shorter-lived locks.
If you can use smaller sets of rows I like the DataSet vs. IDataReader idea. You will be loading data into memory and not consuming any SQL locks, but the amount of memory can cause other problems.
-Brian
You should be able to set the isolation level at the .NET level so that you don't have to include the WITH (NOLOCK) hint.
If you want to go with the batching option, you should be able to specify the Rowcount setting from the .NET level which would tell the database to only return n number of records. By setting these settings at the .NET level they should become database independent and work across all the platforms.

Resources