This is quite a common question and yet the answers are unclear to me. I have 2 different databases on 2 different servers. One is a pure xml database and the other a traditional dbms (sql server). Can anybody point me to recent articles or their experience in dealing with transaction management. I have put together a 1pc strategy which works fine for runtime exceptions. However, I am not sure if it is bullet-proof. Secondly, using spring junit test how to specify a default rollback? It only rolls back the first transactionmanager's transactions. The other transactions are stored in the other database.
Sounds like you want to use a ChainedTransactionManager.
Spring has implemented one of these for neo4j, so you can take the code out of the project.
There was a good article on how to do this, but cant find it anymore. But perhaps this is enough to get you started..
Related
[Some background information - possibly skippable]
To begin with, I have barely any understanding of database management
and just shallow experience with mongoose and node in the backend
realm(a couple of udemy courses). Udemy courses made me believe that
mongodb was still a viable choice for a database with relational
properties and off I went working on a backend for a forum-like
website. After learning about transactions, I attempted to implement
them in my backend, since it seemed perfectly necessary to implement a
rollback feature when executing an array of queries. However it turns
out that transactions are only possible on replica sets - which also
appeared to require a minimum of 2. 3 databases for a startup MVP was
obviously considered an overkill.
[The question]
Is it possible to implement transactions with only 1 database? If so how?
If the above is not possible, how would one launch a mongodb database with minimum configuration while implementing transactions, with the fact that the database is for a startup MVP in consideration.
(For anyone who has experience in implementing a production-level mongo database in a similar scenario) If not using transactions was considered, how to send queries editing/creating multiple documents to a mongoDB database safely without transactions, while not spraying every bit of code with try-catches consisting of queries to rollback every point of failure(which I considered as way too much overhead)
I have a tight deadline, and have already done a substantial bit of groundwork and a couple of routes using mongoose, which means ditching mongodb for a relational database will be a difficult option at the moment.
I think I've googled everything related to the subject at matter and even tried blogs / articles in the second page of a google search(which many including myself consider as the dark web /s).
Yet I do think I may have missed what I was looking for and an answer consisting of just links is also welcome.
Thank you for reading!
You need a replica set[*] to use transactions, but you can create a single-node replica set for testing purposes.
The full procedure is described in documentation, for a single-node RS you follow that as written but only configure a single member.
Briefly, you need to pass --replSet argument to mongod and then connect to it through mongo shell and run rs.initiate().
Note that transactions aren't a magic solution to all problems. There are scenarios where they are appropriate and scenarios where MongoDB provides other functionality that would be a better fit.
[*] or a sharded cluster with MongoDB 4.2+, but this involves more setup work.
I'm looking for a portable database solution I can use with a website that is designed to handle service outages. I need to nightly retrieve a list of users from SQL Server and upsert their details into a portable database. It's roughly about 250,000 users (and growing) and each one has probably 25 fields that are required. Of those fields, i'd say less than 5 need to be searched on. The rest just need retrieving.
The idea is, in times of a service outage, we can use a website that's designed to work from the portable database rather than SQL Server. Our long term goal, is to move to the cloud and handle things in an entirely different way, but for the short term this is our aim.
The website is going to be a .Net Core web api so will be being accessed by multiple users in multiple threads. The website will only ever need read access, it will not be updating these details what-so-ever.
To keep the portable database up-to-date i'm thinking of having another application that just runs nightly to update the data. Our business is 24 hours (albeit quieter overnight), so there is a potential this updater is in use while the website is in use. While service outage would assume the SQL Server is down, this may not be the case. There are other factors in play that could cause what we would describe as outages. This will be the only piece of software updating the database.
I've tried using LiteDB but I couldn't get it working in a way that worked with my concurrency requirements. It did seem to do some of the job, and was easy to get running. However, i'd often run into locked files due to the nature of web api. I did work out a solution for that, but then the updater app couldn't access the database file.
Does anyone have any recommendations I can look into?
Given the description of the problem (1 table, 250k rows with - I assume - relative fast growth rate) and requirements, I don't think a relational database is what you are looking for.
I think nosql databases, or, more specifically, document oriented databases are more fitted to meet your requirements. There are many choices: Mongo, Cassandra, CouchDB, ... the choice is yours.
Personally I have some experience with ElasticSearch (https://www.elastic.co/elasticsearch), that is quite easy to learn, is portable (runs on Linux, Windows, Containers, etc...), is scalable, and it is fast. I mean, really, really fast, you can get results in 10-20 milliseconds (even less, sometimes).
The NEST nuget package acts as a high level client for working with ElasticSearch (https://www.elastic.co/guide/en/elasticsearch/client/net-api/7.x/nest-getting-started.html)
I'm trying to design a database for a small project I'm working on. Eventually, I'd like to make it a web-app, but right now I don't mind just experimenting with data offline. However, I'm stuck in a crossroads.
The basic concept would be a user inputs values for 10 fields, to be compared against what is in the database, with each item having a weighted value. I know that if I was to code it, that I could use a look-up tables for each field, add up the values, and display the result to the end-user.
Another example would be having to get the distance between two points, each point stored in a row, with the X value getting its own column as well as the Y value.
Now, if I store data within a database, should I try to do everything within queries (which I think would involve temporary tables among other things), or just use simple queries, and manipulate the rows returned within the application code?
Right now, I'm thinking to go for the latter (manipulate data within the app) and just use queries to reduce the amount of data that I would have to go sort through. What would you guys suggest?
EDIT: Right now I'm using Microsoft Access to get the basics down pat and try to get a good design going. IIRC with my experience with Oracle and MySQL you can run commands together in a batch process and return just one result. But not sure if you can do that with Access.
If you're using a database I would strongly suggest using SQL to do all your manipulation. SQL is far more capable and powerful for this kind of job as compared to imperative programming languages.
Of course it does imply that you're comfortable in thinking about data as "sets" and programming in a declarative style. But spending time now to get really comfortable with SQL and manipulating data using SQL will pay off big time in the long run. Not only for this project but for projects in the future. I would also suggest using stored procedures over queries in code because stored procedure provide a beautiful abstraction layer allowing your table design to change over time without impacting the rest of the system.
A very big part of using and working with databases is understanding Data modeling, normalization and the like. Like everything else it will be a effort but in the long run it will pay off.
May I ask why you're using Access when you have a far better database available to you such as MSSQL Express? The migration path from MSSQL Express to MSSQL or SQL Azure even is quite seamless and everything you do and experience today (in this project) completely translates to MSSQL Server/SQL Azure for future projects as well as if this project grows beyond your expectations.
I don't understand your last statement about running a batch process and getting just one result, but if you can do it in Oracle and MySQL then you can do it in MSSQL Express as well.
What Shiv said, and also...
A good DBMS has quite a bit of solid engineering in it. There are two components that are especially carefully engineered, namely the query optimizer and the transaction controller. If you adopt the view of using the DBMS as just a stupid table retrieval tool, you will most likely end up inventing your own optimizer and transaction controller inside the application. You won't need the transaction controller until you move to an environment that supports multiple concurrent users.
Unless your engineering talents are extraordinary, you will probably end up with a home brew data management system that is not as good as the one in a good DBMS.
The learning curve for SQL is steep. You need to learn how to phrase queries that join, project, and restrict data from multiple tables. You need to learn how to handle updates in the context of a transaction.
You need to learn simple and sound table design and index design. This includes, but is not limited to, data normalization and data modeling. And you need a DBMS with a good optimizer and good transaction control.
The learning curve is steep. But the view from the top is worth the climb.
Does it make sense to use an OR-mapper?
I am putting this question of there on stack overflow because this is the best place I know of to find smart developers willing to give their assistance and opinions.
My reasoning is as follows:
1.) Where does the SQL belong?
a.) In every professional project I have worked on, security of the data has been a key requirement. Stored Procedures provide a natural gateway for controlling access and auditing.
b.) Issues with Applications in production can often be resolved between the tables and stored procedures without putting out new builds.
2.) How do I control the SQL that is generated? I am trusting parse trees to generate efficient SQL.
I have quite a bit of experience optimizing SQL in SQL-Server and Oracle, but would not feel cheated if I never had to do it again. :)
3.) What is the point of using an OR-Mapper if I am getting my data from stored procedures?
I have used the repository pattern with a homegrown generic data access layer.
If a collection needed to be cached, I cache it. I also have experience using EF on a small CRUD application and experience helping tuning an NHibernate application that was experiencing performance issues. So I am a little biased, but willing to learn.
For the past several years we have all been hearing a lot of respectable developers advocating the use of specific OR-Mappers (Entity-Framework, NHibernate, etc...).
Can anyone tell me why someone should move to an ORM for mainstream development on a major project?
edit: http://www.codinghorror.com/blog/2006/06/object-relational-mapping-is-the-vietnam-of-computer-science.html seems to have a strong discussion on this topic but it is out of date.
Yet another edit:
Everyone seems to agree that Stored Procedures are to be used for heavy-duty enterprise applications, due to their performance advantage and their ability to add programming logic nearer to the data.
I am seeing that the strongest argument in favor of OR mappers is developer productivity.
I suspect a large motivator for the ORM movement is developer preference towards remaining persistence-agnostic (don’t care if the data is in memory [unless caching] or on the database).
ORMs seem to be outstanding time-savers for local and small web applications.
Maybe the best advice I am seeing is from client09: to use an ORM setup, but use Stored Procedures for the database intensive stuff (AKA when the ORM appears to be insufficient).
I was a pro SP for many, many years and thought it was the ONLY right way to do DB development, but the last 3-4 projects I have done I completed in EF4.0 w/out SP's and the improvements in my productivity have been truly awe-inspiring - I can do things in a few lines of code now that would have taken me a day before.
I still think SP's are important for some things, (there are times when you can significantly improve performance with a well chosen SP), but for the general CRUD operations, I can't imagine ever going back.
So the short answer for me is, developer productivity is the reason to use the ORM - once you get over the learning curve anyway.
A different approach... With the raise of No SQL movement now, you might want to try object / document database instead to store your data. In this way, you basically will avoid the hell that is OR Mapping. Store the data as your application use them and do transformation behind the scene in a worker process to move it into a more relational / OLAP format for further analysis and reporting.
Stored procedures are great for encapsulating database logic in one place. I've worked on a project that used only Oracle stored procedures, and am currently on one that uses Hibernate. We found that it is very easy to develop redundant procedures, as our Java developers weren't versed in PL/SQL package dependencies.
As the DBA for the project I find that the Java developers prefer to keep everything in the Java code. You run into the occassional, "Why don't I just loop through all the Objects that just returned?" This caused a number of "Why isn't the index taking care of this?" issues.
With Hibernate your entities can contain not only their linked database properties, but can also contain any actions taken upon them.
For example, we have a Task Entity. One could Add or Modify a Task among other things. This can be modeled in the Hibernate Entity in Named Queries.
So I would say go with an ORM setup, but use procedures for the database intensive stuff.
A downside of keeping your SQL in Java is that you run the risk of developers using non-parameterized queries leaving your app open to a SQL Injection.
The following is just my private opinion, so it's rather subjective.
1.) I think that one needs to differentiate between local applications and enterprise applications. For local and some web applications, direct access to the DB is okay. For enterprise applications, I feel that the better encapsulation and rights management makes stored procedures the better choice in the end.
2.) This is one of the big issues with ORMs. They are usually optimized for specific query patterns, and as long as you use those the generated SQL is typically of good quality. However, for complex operations which need to be performed close to the data to remain efficient, my feeling is that using manual SQL code is stilol the way to go, and in this case the code goes into SPs.
3.) Dealing with objects as data entities is also beneficial compared to direct access to "loose" datasets (even if those are typed). Deserializing a result set into an object graph is very useful, no matter whether the result set was returned by a SP or from a dynamic SQL query.
If you're using SQL Server, I invite you to have a look at my open-source bsn ModuleStore project, it's a framework for DB schema versioning and using SPs via some lightweight ORM concept (serialization and deserialization of objects when calling SPs).
Need a little advice here. We do some windows mobile development using the .NET Compact framework and SQL CE on the mobile along with a central SQL 2005 database at the customers offices. Currently we synchronize the data using merge replication technology.
Lately we've had some annoying problems with synchronization throwing errors and generally being a bit unreliable. This is compounded by the fact that there seems to be limited information out there on replication issues. This suggests to me that it isn't a commonly used technology.
So, I was just wondering if replication was the way to go for synchronizing data or are there more reliable methods? I was thinking web services maybe or something like that. What do you guys use for this implementing this solution?
Dave
I haven't used replication a great deal, but I have used it and I haven't had problems with it. The thing is, you need to set things up carefully. No matter which method you use you need to decide on the rules governing all of the various possible situations - changes in both databases, etc.
If you are more specific about the "generally being a bit unreliable" then maybe you'll get more useful advice. As it is all I can say is, I haven't had issues with it.
EDIT: Given your response below I'll just say that you can certainly go with a custom replication that uses SSIS or some other method, but there are definitely shops out there using replication successfully in a production environment.
well we've had the error occur twice which was a real pain fixing :-
The insert failed. It conflicted with an identity range check constraint in database 'egScheduler', replicated table 'dbo.tblServiceEvent', column 'serviceEventID'. If the identity column is automatically managed by replication, update the range as follows: for the Publisher, execute sp_adjustpublisheridentityrange; for the Subscriber, run the Distribution Agent or the Merge Agent.
When we tried running the stored procedure it messed with the identities so now when we try to synchronize it throws the following error in the replication monitor.
The row operation cannot be reapplied due to an integrity violation. Check the Publication filter. [,,,Table,Operation,RowGuid] (Source: MSSQLServer, Error number: 28549)
We've also had a few issues were snapshots became invalid but these were relatively easy to fix. However all this is making me wonder whether replication is the best method for what we're trying to do here or whether theres an easier method. This is what prompted my original question.
We're working on a similar situation, but ours is involved with programming a tool that works in a disconnected model, and runs on the Windows Desktop... We're using SQL Server Compact Edition for the clients and Microsoft SQL Server 2005 with a web service for the server solution.
TO enable synchronization services, we initially started by building our own synchronization framework, but after many issues with keeping that framework in sync with the rest of the system, we opted to go with Microsoft Synchronization Framework. (http://msdn.microsoft.com/en-us/sync/default.aspx for reference). Our initial requirements were to make the application as easy to use as installing other packages like Intuit QuickBooks, and I think that we have closely succeeded.
The Synchronization Framework from Microsoft has its ups and downs, but the only bad thing that I can say at this point is that documentation is horrendous.
We're in discussions now to decide whether or not to continue using it or to go back to maintaining our own synchronization subsystem. YMMV on it, but for us, it was a quick fix to the issue.
You're definitely pushing the stability envelope for CE, aren't you?
When I've done this, I've found it necessary to add in a fair amount of conflict tolerance, by not thinking of it so much as synchronization as simultaneous asynchronous data collection, with intermittent mutual updates and/or refreshes. In particular, I've always avoided using identity columns for anything. If you can strictly adhere to true Primary Keys based on real (not surrogate) data, it makes things easier. Sometimes a PK comprising SourceUnitNumber and timestamp works well.
If you can, view the remotely collected data as a simple timestamped, sourceided, userided log of cumulative chronologically ordered transactions. Going the other way, the host provides static validation info which never needs to go back - send back the CRUD transactions instead.
Post back how this turns out. I'm interested in seeing any kind of reliable Microsoft technology that helps with this.
TomH & le dorfier - I think that part of our problem is that we're allowing the customer to insert a large number of rows into one of the replicated table with an identity field. Its a scheduling application which can automatically multiple tasks up to a specified month/year. One of the times that it failed was around the time they entered 15000 rows into the table. We'll look into increasing the identity range.
The synchronization framework sounds interesting but sounds like it suffers from a similar problem to replication of having poor documentation. Trying to find help on replication is a bit of a nightmare and I'm not sure I want us to move to something with similar issues. Wish M'soft would stop releasing stuff that seems to have the support of beta s'ware!