In the business I work for we are discussion methods to reduce the read load on our primary database.
One option that has been suggested is to have live one-way replication from our primary database to a slave database. Applications would then read from the slave database and write directly to the primary database. So...
Application Reads From Slave
Application Writes to Primary
Primary Updates Slave Automatically
What are the major pros and cons for this method?
A few cons:
2 points of failure
Application logic will have to take into account the delay between writing something and then reading it, since it won't be available immediately from the secondary database
A strategy I have used is to send key reporting data to a secondary database nightly, de-normalizing it on the way, so that beefy queries can run on that database instead of locking up tables and stealing resources from the OLTP server. I'm not using any formal data warehousing or replication tools, rather I identify problem queries that are Ok without up-to-the-minute data and create data structures on the secondary server specifically for those queries.
There are definitely pros to the "replicate everything" approach:
You can run any ad-hoc query on the secondary, since it has all of your data
If your primary server dies, you can re-purpose the secondary quickly to take over
We are using one-way replications, but not from the same application. Our applications are reading-writing to the master database, the data gets synchronized to the replca database, and the reporting tools are using this replica.
We don't want our application to read from a different database, so in this scenario I would suggest using file groups and partitioning on the master database. Using file groups (especially on different drives) and partitioning of files and indexes can help on performance a lot.
Related
Currently, I generate data on a different datastore and replicate to Snowflake Staging, then that data moves to the Data Warehouse DB through ELT ingestion for Analytics purpose. However this approach can be considered as creating data-silos in itself, since we already have 3 copies of the same data:
Transactional data-store DB
Replicated snowflake staging
Snowflake Data Warehouse DB
From a technical architecture point of view, is it a good idea to use Snowflake as a direct datastore for transactional application? (application that does many CRUD operations). That may help in avoiding the cost of replication and ingestion.
The main problem I see with this approach is that: Snowflake does not enforce any referential integrity (primary keys, foreign keys) so within the CRUD app, I have to either use a MERGE statement always or somehow make sure I don't create duplicate records.
The other problem being in the cloud, the distance (aka network) between the app and snowflake decides the performance of the transactions, I want good, consistent performance of my CRUD operations.
Any thoughts/suggestions are much appreciated.
Snowflake as of today does not perform well with singleton updates and inserts, which is what we see mostly with transactional databases. I have seen a performance degradation when using singleton inserts are submitted against Snowflake.
On the contrary, they are very optimized for bulk ingestion of unstructured data and structured data though and are designed for OLAP warehouses. You can still use it but you may see the same performance degradation. Also, primary keys can be defined but they are not enforced.
In my opinion, if you are faced with that challenge, you have the option to use a Postgre SQL DB (open source) in the cloud as your transactional database and it acts as a good complement to Snowflake as the OLAP database.
No. Snowflake isn't good as a transactional / OLTP database for the reasons you've mentioned. Plus, it won't perform well with many individual CRUD operations due to how they structure the data (optimised for OLAP workloads).
Just want to point out that there are benefits to creating separate databases, for one you want to isolate your transactional database from that of your analytics database otherwise you could be significantly affect the performance of the application. Secondly, the data in the transactional database could change and if you had to reprocess the data for whatever reason you may not be able to do so. There are many more, but I will stop here :-)
I'm managing an online book store website. For sake of high availability I've setup two Tomcat instances for running the website application, they are the exactly same program, and they are sharing the same database which located in another server.
My question is that how can I avoid conflicts or dirty data when the two applications do the same updates/inserts at the same time to the database.
For example: update t_sale set total='${num}' where category='cs', if there are two processes execute the above sql simultaneously would cause data lost.
If by "database" you are talking about a well designed schema that is running on an RDBMS such as Oracle, DB2, or SQL Server, then the database itself will prevent what you call "conflicts" by locking parts of the database during each update transaction.
You can prevent "dirty data" from getting into the database by adding features such as check clauses and primary-foreign key structures in the database itself.
The current single application server can handle about 5000 concurrent requests. However, the user base will be over millions and I may need to have two application servers to handle requests.
So the design is to have a load balancer to hope it will handle over 10000 concurrent requests. However, the data of each users are being stored in one single database. So the design is to have two or more servers, shall I do the followings?
Having two instances of databases
Real-time sync between two database
Is this correct?
However, if so, will the sync process lower down the performance of the servers
as Database replication seems costly.
Thank you.
You probably want to think of your service in "tiers". In this instance, you've got two tiers; the application tier and the database tier.
Typically, your application tier is going to be considerably easier to scale horizontally (i.e. by adding more application servers behind a load balancer) than your database tier.
With that in mind, the best approach is probably to overprovision your database (i.e. put it on its own, meaty server) and have your application servers all connect to that same database. Depending on the database software you're using, you could also look at using read replicas (AWS docs) to reduce the strain on your database.
You can also look at caching via Memcached / Redis to reduce the amount of load you're placing on the database.
So – tl;dr – put your DB on its own, big, server, and spread your application code across many small servers, all connecting to that same DB server.
Best option could be the synchronizing the standby node with data from active node as cost effective solution since it can be achievable using open source relational database(e.g. Maria DB).
Do not store computable results and statistics that can be easily doable at run time which may help reduce to data size.
If history data is not needed urgent for inquiries , it can be written to text file in easily importable format to database(e.g. .csv).
Data objects that are very oftenly updated can be kept in in-memory database as key value pair, use scheduled task to perform batch update/insert to relation database to achieve persistence
Implement retry logic for database batch update tasks to handle db downtimes or network errors
Consider writing data to relational database as serialized objects
Cache configuration data to memory from database either periodically or via API to refresh the changing part.
Can anyone explain the differences from a replication db vs a mirroring db server?
I have huge reports to run. I want to use a secondary database server to run my report so I can off load resources from the primary server.
Should I setup a replication server or a mirrored server and why?
For your requirements the replication is the way to go. (asumming you're talking about transactional replication) As stated before mirroring will "mirror" the whole database but you won't be able to query unless you create snapshots from it.
The good point of the replication is that you can select which objects will you use and you can also filter it, and since the DB will be open you can delete info if it's not required( just be careful as this can lead to problems maintaining the replication itself), or create specific indexes for the report which are not needed in "production". I used to maintain this kind of solutions for a long time with no issues.
(Assuming you are referring to Transactional Replication)
The biggest differences are: 1) Replication operates on an object-by-object basis whereas mirroring operates on an entire database. 2) You can't query a mirrored database directly - you have to create snapshots based on the mirrored copy.
In my opinion, mirroring is easier to maintain, but the constant creation of snapshots may prove to be a hassle.
As mentioned here
Database mirroring and database replication are two high data
availability techniques for database servers. In replication, data and
database objects are copied and distributed from one database to
another. It reduces the load from the original database server, and
all the servers on which the database was copied are as active as the
master server. On the other hand, database mirroring creates copies of
a database in two different server instances (principal and mirror).
These mirror copies work as standby copies and are not always active
like in the case of data replication.
This question can also be helpful or have a look at MS Documentation
Hi can any body tell me what is use of replication in sqlserver2005.
backup and replicaton looks same?what is diference b/w them
Backups are exactly that: backups. They enable you to recover the data if something bad happens.
Replication is another beast entirely. It basically distributes the data across multiple nodes so that each node has a complete, (close to) up-to-date copy of the data.
There are a number of reasons why you would use replication including, but not limited to:
High availability so that, if one node goes down, other nodes can still service requests.
Geographical distribution, meaning your data can be placed close to those that need it. Clients in Belarus don't need to go all the way to Montana to get the data if you maintain a local replica in Belarus (or somewhere close) - this is for performance. You may have 10,000 clients in Belarus - it's quicker to send one copy over than have all 10,000 request data [although this depends on how often they request data].
Prioritization. If your reporting users (bank management) have a lower service level agreement than your customer-facing staff (bank tellers) [and they should], you can put all the management onto a replica so as not to slow down the primary copy.
Replication is used for a different purpose, for example to make reports without putting that load on the 'real' database.
Replication increases system availability. If one set of database is down, you can serve out of replica.
Backup saves you from catastrophic errors such as human error that dropped the production database. Note that in this case, replication won't save you as it will dutifully replicate drop command.
SQL Server replication is the process of distributing data from a source database to one or more destination databases throughout the enterprise.
Replication is a great solution for maintaining a reporting server.
Clients at the site to which the data is replicated experience improved performance because those clients can access data locally rather than connecting to a remote database server over a network.
Clients at all sites experience improved availability of replicated data. If the local copy of the replicated data is unavailable, clients can still access the remote copy of the data.
Replication: Lots of data, fast and most recent.
Backup/Restore: Some data, perhaps a bit slower, and a specific point in time.
Replication can be used to address a number of different scenarios as detailed below.
Just to be clear however, Replication is not the same as Database Backup
Scenarios:
Server to server: Replicating Data in a Server to Server Environment
Improving Scalability and Availability
Data Warehousing and Reporting
Integrating Data from Multiple Sites(Server)
Integrating Heterogeneous
Data Offloading Batch Processing
Server to client: Replicating Data Between a Server and Clients
Exchanging Data with Mobile Users
Consumer Point of Sale (POS)
Applications Integrating Data from
Multiple Sites (Client)
For a full overview of Microsoft SQL Server Replication see the following Microsoft reference.
http://msdn.microsoft.com/en-us/library/ms151198(SQL.90).aspx
Choose the track that is most appropriate to you (i.e. Developer / Architect) and all shall be revealed :-)