I mentioned about this application in my earlier post about PBNI. The application (Tax Software) was written in PB/Java/EAF running on EA Server. It was developed about 8 years ago with the then available technologies. The application is working fine, but there are leftovers from past legacies that I am trying to to clean up code/design.
There is certain code that does database (Oracle) transactions across PB and Java and since the 2 happened to be in different Database (Oracle) sessions, changes in one aren't visible in the other. So, in these cases, the application uses a switch to use PB code for the complete transaction instead of splitting across PB and Java. Otherwise, it uses PB/Java combination.
What this means is that identical sets of program blocks exist in PB and Java. Maintenance nightmare!! I believe PB objects were created first and someone ported those to Java for performance reason (not considering the above split transaction issue). I am trying to eliminate one (probably the PB code, considering performance). I am exploring PBNI in this context.
Please let me know, if any of you faced a similar situation and how you would solve it.
Thanks a lot in advance.
Sam
I don't claim to fully understand the nature of your application, but, please consider my comments.
Let PowerBuilder and Java perform necessary updates. It seems to me that you could commit transactions in either system and employ the idea of a logical commit. At the beginning of a transaction, update a column to indicate that the record is logically uncommitted. Java and PowerBuilder take turns updating and committing the record(s). Pass ROWID(s) between the two programs and a SELECT in either program would provide accurate data. When the transaction is logically complete, update the column to logically committed.
As for performance, moving business logic to an Oracle package or stored procedure is never a bad idea. It might take a little planning, but, the same code can run from PowerBuilder OR Java. Plus, there are some outstanding tuning tools for Oracle. Keep your transactions short and commit inside the package/procedure.
Don't be afraid to put logically incomplete transactions in a "work" table and copy the logically complete rows to the "complete" table.
Related
I have an old app running on SQL Server (but I suspect the concept I'm asking about applies to most/all major DBs) that got re-written. Aside from the fact that a lot of the UI changes and more modular backend code resulted in different queries, one of the major changes is that the old code used zero explicit transactions.
Yeah, as in if an error happened, you'd be left with orphan records and such. The new app has corrected that, using transactions when there are multiple inserts/updates. That seems to be a no-brainer, but we're finding that we're getting a lot of complaints about performance, particularly from clients that have more data (each client has their own separate DB). Am I correct in assuming that, given the transactions, there's a lot more room for resources to be waiting on locks, which could then drastically hurt performance?
Btw, another major difference is that the old app relied on stored procedures to a point, whereas the new app does not use them at all. I'm throwing this in here just in case, but I'm really under the impression that transactions being a problem is more likely, especially given the complexity of the queries in the system (tons of queries with lots of joins, subqueries in the SELECT clause, etc.)
Also, it's worth noting we're not talking huge databases/tables. Each client has their own database and clients complaining have tables with a few million records at worse, but not billions/trillions of records or anything like that. While some new queries have been introduced and some have changed, the majority of queries are the same as in the old system, sometimes just running outside of a stored proc when before they were inside of one. Also, a lot of the more complicated queries have been checked and when run on their own, they're fast.
We solved this by setting up a series of jobs/procs packaged as dbWarden. This allowed us to identify that there was a lot of blocking happening in certain cases. Once we figure that out, we made adjustments to avoid the blocking, including making certain operations read/write to different DBs, adjusting indexes, and making some operations query different data stores (ElasticSearch).
This is quite a common question and yet the answers are unclear to me. I have 2 different databases on 2 different servers. One is a pure xml database and the other a traditional dbms (sql server). Can anybody point me to recent articles or their experience in dealing with transaction management. I have put together a 1pc strategy which works fine for runtime exceptions. However, I am not sure if it is bullet-proof. Secondly, using spring junit test how to specify a default rollback? It only rolls back the first transactionmanager's transactions. The other transactions are stored in the other database.
Sounds like you want to use a ChainedTransactionManager.
Spring has implemented one of these for neo4j, so you can take the code out of the project.
There was a good article on how to do this, but cant find it anymore. But perhaps this is enough to get you started..
This seems to be an issue that keeps coming back in every web application; you're improving the back-end code and need to alter a table in the database in order to do so. No problem doing manually on the development system, but when you deploy your updated code to production servers, they'll need to automatically alter the database tables too.
I've seen a variety of ways to handle these situations, all come with their benefits and own problems. Roughly, I've come to the following two possibilities;
Dedicated update script. Requires manually initiating the update. Requires all table alterations to be done in a predefined order (rigid release planning, no easy quick fixes on the database). Typically requires maintaining a separate updating process and some way to record and manage version numbers. Benefit is that it doesn't impact running code.
Checking table properties at runtime and altering them if needed. No manual interaction required and table alters may happen in any order (so a quick fix on the database is easy to deploy). Another benefit is that the code is typically a lot easier to maintain. Obvious problem is that it requires checking table properties a lot more than it needs to.
Are there any other general possibilities or ways of dealing with altering database tables upon application updates?
I'll share what I've seen work best. It's just expanding upon your first option.
The steps I've usually seen when updating schemas in production:
Take down the front end applications. This prevents any data from being written during a schema update. We don't want writes to fail because relationships are messed up or a table is suddenly out of sync with the application.
Potentially disconnect the database so no connections can be made. Sometimes there is code out there using your database you don't even know about!
Run the scripts as you described in your first option. It definitely takes careful planning. You're right that you need a pre-defined order to apply the changes. Also I would note often times you need two sets of scripts, one for schema updates and one for data updates. As an example, if you want to add a field that is not nullable, you might add a nullable field first, and then run a script to put in a default value.
Have rollback scripts on hand. This is crucial because you might make all the changes you think you need (since it all worked great in development) and then discover the application doesn't work before you bring it back online. It's good to have an exit strategy so you aren't in that horrible place of "oh crap, we broke the application and we've been offline for hours and hours and what do we do?!"
Make sure you have backups ready to go in case (4) goes really bad.
Coordinate the application update with the database updates. Usually you do the database updates first and then roll out the new code.
(Optional) A lot of companies do partial roll outs to test. I've never done this, but if you have 5 application servers and 5 database servers, you can first roll out to 1 application/1 database server and see how it goes. Then if it's good you continue with the rest of the production machines.
It definitely takes time to find out what works best for you. From my experience doing lots of production database updates, there is no silver bullet. The most important thing is taking your time and being disciplined in tracking changes (versioning like you mentioned).
My company's workflow relies on two MSSQL databases: one for web content data and the other is the ERP. I've been doing some proof of concept on some tools that would serve as an intermediary that builds a relationship between the datasets, and thus far its proving to be monumentally faster.
Instead of reading out to both datasets, I'd much rather house a database on the local Linux box that represents the data I'm working with. That way, its less pressure on the system as a whole.
What I don't understand is if there is a way to update this new database without completely dropping the table each time or running through a punishing line by line check. If the records had timestamps, this would be easy...but they don't.
Does anyone have any tips? Am I missing some crucial feature I don't know about, or am I
SOL?
Finally, is there one preferred database stack out there anyone thinks might work better than another? I'm not committed to any technology at this point.
Thanks!
Have you read about the MERGE statement in SQL? It allows update or inserts on existing tables.
I assume your tables have primary keys even though you say there is no timestamp.
Need a little advice here. We do some windows mobile development using the .NET Compact framework and SQL CE on the mobile along with a central SQL 2005 database at the customers offices. Currently we synchronize the data using merge replication technology.
Lately we've had some annoying problems with synchronization throwing errors and generally being a bit unreliable. This is compounded by the fact that there seems to be limited information out there on replication issues. This suggests to me that it isn't a commonly used technology.
So, I was just wondering if replication was the way to go for synchronizing data or are there more reliable methods? I was thinking web services maybe or something like that. What do you guys use for this implementing this solution?
Dave
I haven't used replication a great deal, but I have used it and I haven't had problems with it. The thing is, you need to set things up carefully. No matter which method you use you need to decide on the rules governing all of the various possible situations - changes in both databases, etc.
If you are more specific about the "generally being a bit unreliable" then maybe you'll get more useful advice. As it is all I can say is, I haven't had issues with it.
EDIT: Given your response below I'll just say that you can certainly go with a custom replication that uses SSIS or some other method, but there are definitely shops out there using replication successfully in a production environment.
well we've had the error occur twice which was a real pain fixing :-
The insert failed. It conflicted with an identity range check constraint in database 'egScheduler', replicated table 'dbo.tblServiceEvent', column 'serviceEventID'. If the identity column is automatically managed by replication, update the range as follows: for the Publisher, execute sp_adjustpublisheridentityrange; for the Subscriber, run the Distribution Agent or the Merge Agent.
When we tried running the stored procedure it messed with the identities so now when we try to synchronize it throws the following error in the replication monitor.
The row operation cannot be reapplied due to an integrity violation. Check the Publication filter. [,,,Table,Operation,RowGuid] (Source: MSSQLServer, Error number: 28549)
We've also had a few issues were snapshots became invalid but these were relatively easy to fix. However all this is making me wonder whether replication is the best method for what we're trying to do here or whether theres an easier method. This is what prompted my original question.
We're working on a similar situation, but ours is involved with programming a tool that works in a disconnected model, and runs on the Windows Desktop... We're using SQL Server Compact Edition for the clients and Microsoft SQL Server 2005 with a web service for the server solution.
TO enable synchronization services, we initially started by building our own synchronization framework, but after many issues with keeping that framework in sync with the rest of the system, we opted to go with Microsoft Synchronization Framework. (http://msdn.microsoft.com/en-us/sync/default.aspx for reference). Our initial requirements were to make the application as easy to use as installing other packages like Intuit QuickBooks, and I think that we have closely succeeded.
The Synchronization Framework from Microsoft has its ups and downs, but the only bad thing that I can say at this point is that documentation is horrendous.
We're in discussions now to decide whether or not to continue using it or to go back to maintaining our own synchronization subsystem. YMMV on it, but for us, it was a quick fix to the issue.
You're definitely pushing the stability envelope for CE, aren't you?
When I've done this, I've found it necessary to add in a fair amount of conflict tolerance, by not thinking of it so much as synchronization as simultaneous asynchronous data collection, with intermittent mutual updates and/or refreshes. In particular, I've always avoided using identity columns for anything. If you can strictly adhere to true Primary Keys based on real (not surrogate) data, it makes things easier. Sometimes a PK comprising SourceUnitNumber and timestamp works well.
If you can, view the remotely collected data as a simple timestamped, sourceided, userided log of cumulative chronologically ordered transactions. Going the other way, the host provides static validation info which never needs to go back - send back the CRUD transactions instead.
Post back how this turns out. I'm interested in seeing any kind of reliable Microsoft technology that helps with this.
TomH & le dorfier - I think that part of our problem is that we're allowing the customer to insert a large number of rows into one of the replicated table with an identity field. Its a scheduling application which can automatically multiple tasks up to a specified month/year. One of the times that it failed was around the time they entered 15000 rows into the table. We'll look into increasing the identity range.
The synchronization framework sounds interesting but sounds like it suffers from a similar problem to replication of having poor documentation. Trying to find help on replication is a bit of a nightmare and I'm not sure I want us to move to something with similar issues. Wish M'soft would stop releasing stuff that seems to have the support of beta s'ware!