Which solution is preferred when populating two Cassandra tables? - database

I have a performance question. I got two tables in Cassandra, both of them have exactly same structure. I need to save incoming data in both of them. The problem I have is what would be better solution to do this:
Create two repositories, both of them open Cassandra session, save data to both tables separately (all in code).
Save data to one table, have a trigger on this table and copy incoming data to another one
Any other solution?
I think first two are ok, but I am not sure if first one is good enough. Can someone explain it to me?

This sounds like a good use case for BATCH. Essentially, you can assemble two write statements and execute them in a BATCH to ensure atomicity. That should keep the two tables in-sync. Example below from the DataStax docs (URL).
cqlsh> BEGIN LOGGED BATCH
INSERT INTO cycling.cyclist_names (cyclist_name, race_id) VALUES ('Vera ADRIAN', 100);
INSERT INTO cycling.cyclist_by_id (race_id, cyclist_name) VALUES (100, 'Vera ADRIAN');
APPLY BATCH;

+1 to Aaron's response about using BATCH statements but the quoted example is specific to CQL. It's a bit more nuanced when implementing it in your app.
If you're using the Java driver, a typical INSERT statement would look like:
SimpleStatement simpleInsertUser = SimpleStatement.newInstance(
"INSERT INTO users (...) VALUES (?), "..." );
And here's a prepared statement:
PreparedStatement psInsertUserByMobile = session.prepare(
"INSERT INTO users_by_mobile (...) VALUES (...)" );
If you were to batch these 2 statements:
BatchStatement batch = BatchStatement.newInstance(
DefaultBatchType.LOGGED,
simpleInsertBalance,
preparedInsertExpense.bind(..., false) );
session.execute(batch);
For item 2 in your list, I don't know companies who use Cassandra TRIGGERs in production so it isn't something I would recommend. It was experimental for a while and I don't have enough experience to be able to recommend them for production.
For item 3, this is the use case that Materialized Views are trying to solve. They are certainly a lot simpler from a dev point-of-view since the table updates are done server-side instead of client-side.
They are OK to use if you don't have a lot of tables but be aware that the updates to the views happen asynchronously (not at the same time as when the mutations occur on the base table). With MVs, there's also a risk that when the views get so out-of-sync with the base table, the only solution is to drop and recreate the MV.
If you prefer not to use BATCH statements, just make sure you're fully aware of the tradeoffs with using MVs. If you're interested, I've explained it in a bit more detail in https://community.datastax.com/articles/2774/. Cheers!

Related

I am struggling with migrating the temp tables (SQL server) to oracle

I am struggling with migrating the temp tables (SQL server) to oracle. Mostly, oracle don't consider to use temporary table inside the store procedure but in sql server, they are using temp tables for small fetching record and also manipulate same.
How to overcome this issue. I am also searching some online articles about migrating temp table to oracle but they are not clearly explained for my expectations.
i got information like using inline view, WITH clause, ref cursor instead of temp table. I am totally confused.
Please suggest me, in which case may use Inline view, WITH clause, ref cursor.
This may be helpful for improve my knowledge and also doing job well.
As always thank you for your valuable time in helping out the newbies.
Thanks
Alsatham hussain
Like many questions, the answer is "it depends". A few things
Oracle's "temp" table is called a GLOBAL TEMPORARY TABLE (GTT). Unlike most other vendor's TEMP tables, their definition is global. Scripts or programs in SQL Server (and others), will create a temp table and that temp table will disappear at the end of a session. This means that the script or program can be rerun or run concurrently by more than one user. However, this will not work with a GTT, since the GTT will remain in existence at the end of the session, so the next run that attempts to create the GTT will fail because it already exists.
So one approach is to pre-create the GTT, just like the rest of the application tables, and then change the program to INSERT into the gtt, rather than creating it.
As others have said, using a CTE Common Table Expression) could potentially work, buy it depends on the motivation for using the TEMP table in the first place. One of the advantages of the temp table is it provides a "checkpoint" in a series of steps, and allows for stats to be gathered on intermediate temporary data sets; it what is a complex set of processing. The CTE does not provided that benefit.
Others "intermediate" objects such as collections could also be used, but they have to be "managed" and do not really provide any of the advantages of being able to collect stats on them.
So as I said at the beginning, you choice of solution will depend somewhat on the motivation for the original temp table in the first place.

Optimal insert/update batch for SQL Server

I'm making frequent inserts and updates in large batches from c# code and I need to do it as fast as possible, please help me find all ways to speed up this process.
Build command text using StringBuilder, separate statements with ;
Don't use String.Format or StringBuilder.AppendFormat, it's slower then multiple StringBuilder.Append calls
Reuse SqlCommand and SqlConnection
Don't use SqlParameters (limits batch size)
Use insert into table values(..),values(..),values(..) syntax (1000 rows per statement)
Use as few indexes and constraints as possible
Use simple recovery model if possible
?
Here are questions to help update the list above
What is optimal number of statements per command (per one ExecuteNonQuery() call)?
Is it good to have inserts and updates in the same batch, or it is better to execute them separately?
My data is being received by tcp, so please don't suggest any Bulk Insert commands that involve reading data from file or external table.
Insert/Update statements rate is about 10/3.
Use table-valued parameters. They can scale really well when using large numbers of rows, and you can get performance that approaches BCP level. I blogged about a way of making that process pretty simple from the C# side here. Everything else you need to know is on the MSDN site here. You will get far better performance doing things this way rather than making little tweaks around normal SQL batches.
As of SQLServer2008 TableParameters are the way to go. See this article (step four)
http://www.altdevblogaday.com/2012/05/16/sql-server-high-performance-inserts/
I combined this with parallelizing the insertion process. Think that helped also, but would have to check ;-)
Use SqlBulkCopy into a temp table and then use the MERGE SQL command to merge the data.

Safe/reliable/standard process for making major changes to a database with existing data?

I would like to take one table that is heavy with flags and fields, and break it into smaller tables. The parent table to be revised/broken down already contains live data that must be handled with care.
Here is my plan of attack, that I'm hoping to execute this weekend while no one is using the systsem.
Create the new tables that we will need
Rename the existing parent table, ParentTable, to ParentTableOLD
Create a new table called ParentTable with the unneeded fields gone, and new fields added
Run a procedure to copy the entries in ParentTableOLD to the new tables, mapping old data to new tables/fields where applicable
Delete the ParentTableOLD table
The above seems pretty reasonable and simple to me, I'm fairly certain it will work. I'm interested in other techniques to achieve this (the above is the only thing I can think of), as well as any kind of tools to help stay organized. Right now I'm running on pen and paper.
Reason I ask is that several times now, I've been re-inventing the wheel just because I didn't know any better, and someone more experienced came along and saw what I was doing and said, "oh there's a built-in way to help do this," or, "there's a simpler way to do this." I did coding for months and months with Visual Studio before someone stopped by and said "you know about breakpoints to step through the code, yeah?" --- life changing, hah.
I have SQL Server 2008 R2 with SSMS.
A good trick to assist you in creating your '_old' tables is:
SELECT *
INTO mytable_old
FROM mytable
SELECT INTO will copy all of the data and create your table for you in one step.
This said - I would actually retain the current table names and instead copy everything into another schema. This will make adapting queries and reports to run over the old schema (where needed) a lot easier then having to add '_old' to all the names (since instead you can just find/replace the schema names).
If at all possible, I'd be doing this is some sort of test environment first and foremost. If you have external applications that rely on the database, then make sure they all run against your modified structure without any hiccups.
Also do a search on your database objects that might reference the table you are going to rename. For example;
SELECT Name
FROM sys.procedures
WHERE OBJECT_DEFINITION(OBJECT_ID) LIKE '%MyTable%'
Try and ensure some sort of functional equivalence from queries between your new and old schemas. Have some query/queries that can be run against your renamed table and then have your reworked schema referencing your new table structure. This way you can make sure the data returned is the same for both structures. If possible, prepare these all ahead of time so that it is simply a series of checks you can do once you've done your modifications and if there are differences, this can help you decide whether you proceed with the change or back it out.
Lastly, have a plan for working out how you could revert to the old schema if something catastrophic were to occur. If you'd been working with your new table structure for a period of time, and then discovered a major issue, could you revert back to the old table and successfully get the data out of your modified table structure back to the old table? Basically, just be a follow the boy scout rules and be prepared.
This isn't really an answer for your overall problem, but a couple tools that you might find useful for your Step 4 is RedGate's SQL Compare and Data Compare. SQL Compare will perform schema migrations, and Data Compare will help migrate data. You can move data to new columns and new tables, populate default values, sync from dev to production, among other things.
You can make your changes in a dev environment with production data, and when you're satisfied with the process, do the actual migration in production.
Make a backup of database (for reference : http://msdn.microsoft.com/en-us/library/ms187510.aspx) and then you can do the required steps. If everything goes fine, then go ahead else restore the old database ( for reference : http://msdn.microsoft.com/en-us/library/ms177429.aspx)
You can even automate this process of making a backup for say, every week.

What's the best way to convert one Oracle table (data) to fill a slightly different Oracle table?

I have two Oracle tables, an old one and a new one.
The old one was poorly designed (more so than mine, mind you) but there is a lot of current data that needs to be migrated into the new table that I created.
The new table has new columns, different columns.
I thought of just writing a PHP script or something with a whole bunch of string replacement... clearly that's a stupid way to do it though.
I would really like to be able to clean up the data a bit along the way as well. Some it was stored with markup in it (ex: "First Name"), lots of blank space, etc, so I would really like to fix all that before putting it into the new table.
Does anyone have any experience doing something like this? What should I do?
Thanks :)
I do this quite a bit - you can migrate with simple select statememt:
create table newtable as select
field1,
trim(oldfield2) as field3,
cast(field3 as number(6)) as field4,
(select pk from lookuptable where value = field5) as field5,
etc,
from
oldtable
There's really very little you could do with an intermediate language like php, etc that you can't do in native SQL when it comes to cleaning and transforming data.
For more complex cleanup, you can always create a sql function that does the heavy lifting, but I have cleaned up some pretty horrible data without resorting to that. Don't forget in oracle you have decode, case statements, etc.
I'd checkout an ETL tool like Pentaho Kettle. You'll be able to query the data from the old table, transform and clean it up, and re-insert it into the new table, all with a nice WYSIWYG tool.
Here's a previous question i answered regarding data migration and manipulation with Kettle.
Using Pentaho Kettle, how do I load multiple tables from a single table while keeping referential integrity?
If the data volumes aren't massive and if you are only going to do this once, then it will be hard to beat a roll-it-yourself program. Especially if you have some custom logic you need implemented.
The time taken to download, learn & use a tool (such as pentaho etc.) will probably not worth your while.
Coding a select *, updating columns in memory & doing an insert into will be quickly done in PHP or any other programming language.
That being said, if you find yourself doing this often, then an ETL tool might be worth learning.
I'm working on a similar project myself - migrating data from one model containing a couple of dozen tables to a somewhat different model of similar number of tables.
I've taken the approach of creating a MERGE statement for each target table. The source query gets all the data it needs, formats it as required, then the merge works out if the row already exists and updates/inserts as required. This way, I can run the statement multiple times as I develop the solution.
Depends on how complex the conversion process is. If it is easy enough to express in a single SQL statement, you're all set; just create the SELECT statement and then do the CREATE TABLE / INSERT statement. However, if you need to perform some complex transformation or (shudder) split or merge any of the rows to convert them properly, you should use a pipelined table function. It doesn't sound like that is the case, though; try to stick to the single statement as the other Chris suggested above. You definitely do not want to pull the data out of the database to do the transform as the transfer in and out of Oracle will always be slower than keeping it all in the database.
A couple more tips:
If the table already exists and you are doing an INSERT...SELECT statement, use the /*+ APPEND */ hint on the insert so that you are doing a bulk operation. Note that CREATE TABLE does this by default (as long as it's possible; you cannot perform bulk ops under certain conditions, e.g. if the new table is an index-organized table, has triggers, etc.
If you are on 10.2 or later, you should also consider using the LOG ERRORS INTO clause to log rejected records to an error table. That way, you won't lose the whole operation if one record has an error you didn't expect.

Send email for each row in a result set

I'd like to send an email for each row of a result set using sp_send_dbmail.
What is the appropriate way to accomplish this without using loops?
Edit: I'm not insisting a loop is not proper here, but is there a set based way to do this. I've tried creating a function, but a function cannot call a stored proc inside it. Only another func or extended sp (which I'd rather not do either).
This case is exactly what loops are good for (and designed for).
Since you do things that fall out of database scope, it's perfectly legitimate to use loops for them.
Databases are designed to store data and perform the queries against these data which return them in most handy way.
Relational databases can return data in form of rowsets.
Cursors (and loops that use them) are designed to keep a stable rowset so that some things with each of its rows can be done.
By "things" here I mean not pure database tricks, but real things that affect the outer world, the things the database is designed for, be it displaying a table on a webpage, generating a financial report or sending an email.
It's bad to use cursors for pure database tasks (like transforming one rowset to another), but it's perfectly nice to use them for the things like that one you described.
Set based methods are designed to work within a single transaction.
If your set-base query will fail for some reason, you database will revert to the state in was before, but you cannot "rollback" a sent email. You won't be able to keep track of your messages in case of an error.
It must be a row-by-row operation if you need an email per row. It's not a standard set based action.
Either you WHILE through it in SQL or you "for each" in a client language
I would not send emails from triggers BTW: your transaction is open while the trigger executes
Not the best practice but if you want to avoid loops:
You could create a "SendMails" table, with a trigger on Insert
The sp_send_dbmail is called from inside the trigger
then you do:
Truncate Table SendMails
insert into SendMails (From, To, Subject,text) Select field1,field2,field3,field4 from MyTable
Set up a data-driven subscription in SQL Server Reporting Services :-D
Sounds like an SSRS requirement to me - TSQL isn't really designed for reporting in and of itself.
The best way to accomplish this is to put your email sending logic in a user defined function.
Then you would simply call SELECT MyEmailFunc(emailaddress) FROM MyTable
It avoids loops and you can even use it in an update statement to show that the email was sent. For example:
UDPATE MyTable SET SENT = MyEmailFunc(emailaddress) WHERE sent = 0

Resources