I developed an app. I used 2 different databases in this application. The first of these databases (D1) is my database, where my most up-to-date data is kept and accessible to all applications. My second database (D2) is my database where I can continue to perform operations when the user (U) is not connected to the Internet.
I'm coming to my question. How should I check the lack of data between these two databases. Let me explain my method. I continue to pull and send data from D1, while at the same time writing my data to D2. Thus, D2 works with complete data in case of possible internet outage. I mark the actions I perform when there is no Internet connection with the help of a column in the database. When I connect to the Internet, I transmit marked data to the D1 database and make transactions accessible to other users.
Is the structure I want to build right? Is there a better way? Can you help me?
Related
The current single application server can handle about 5000 concurrent requests. However, the user base will be over millions and I may need to have two application servers to handle requests.
So the design is to have a load balancer to hope it will handle over 10000 concurrent requests. However, the data of each users are being stored in one single database. So the design is to have two or more servers, shall I do the followings?
Having two instances of databases
Real-time sync between two database
Is this correct?
However, if so, will the sync process lower down the performance of the servers
as Database replication seems costly.
Thank you.
You probably want to think of your service in "tiers". In this instance, you've got two tiers; the application tier and the database tier.
Typically, your application tier is going to be considerably easier to scale horizontally (i.e. by adding more application servers behind a load balancer) than your database tier.
With that in mind, the best approach is probably to overprovision your database (i.e. put it on its own, meaty server) and have your application servers all connect to that same database. Depending on the database software you're using, you could also look at using read replicas (AWS docs) to reduce the strain on your database.
You can also look at caching via Memcached / Redis to reduce the amount of load you're placing on the database.
So – tl;dr – put your DB on its own, big, server, and spread your application code across many small servers, all connecting to that same DB server.
Best option could be the synchronizing the standby node with data from active node as cost effective solution since it can be achievable using open source relational database(e.g. Maria DB).
Do not store computable results and statistics that can be easily doable at run time which may help reduce to data size.
If history data is not needed urgent for inquiries , it can be written to text file in easily importable format to database(e.g. .csv).
Data objects that are very oftenly updated can be kept in in-memory database as key value pair, use scheduled task to perform batch update/insert to relation database to achieve persistence
Implement retry logic for database batch update tasks to handle db downtimes or network errors
Consider writing data to relational database as serialized objects
Cache configuration data to memory from database either periodically or via API to refresh the changing part.
We are building a web application and plan to run it on AWS. Created a RDS instance with MySQL. The proposed architecture is as follows:
Data is being uploaded from company data mart to Core DB in RDS. On the other side, user is sending data through our Rest API to post data. This user input data will be saved in a separate DB within the same RDS, as one of our architects suggested. The data will then be periodically copied to a table inside Core DB. We will have a rule engine running based Core DB. Whenever an exception is detected, notification will be sent to customers.
The overall structure seems fine. One thing I would change though is instead of having two separate DBs, we can just have one DB and have user input data in a table in the same Database. The logic behind separate DBs, according to our architect, is for security concerns. Since Core DB will have data from our company, it is better to be on its own. So the http requests from clients will only affect the user input DB.
While it makes sense, I am not sure it is really necessary. First all the user input is authenticated. Secondly the web api provides another protection layer against database since it only allows certain requests, which in this case couple of endpoints for post request. Besides if someone can somehow still hack into the User Input DB in RDS, since the it resides on the same RDS instance plus there is data transfer between DBs, it is not impossible they can't get to Core.
That said, do we really need separate DBs? And if this is the way to go, what is best way to sync from User Input DB to a User Input TB in Core DB?
In terms of security reason, separating the db are not magically make it true. My suggestion :
Restrict the API layer, such as only have write access ( just in case to avoiding accidentally deleting data)
For credentials data, don't put it on source code, you can put it as environment variables, example on ElasticBeanstalk Environment Variables
For RDS itself, put it under VPC
In term of synchronizing data if you have to go with 2 db.
if your two database are exactly same on the schema, you can use db replication capability (such as mysql replication)
if not, you can send it to message broker service (SQS) then create a worker to pulling it then save it to target database
or you can use another service such as datapipeline
The system in question is for a company with multiple locations. Unreliable internet speeds/availability at some locations have led to the path of a local server at each location off of which a location and a central server.
The role of the local server is for each location to be able to run no matter if it is connected to the outside world or not, or to eliminate high latency if the the connection speed is less than optimal.
The role of the central server is two-fold:
Configuration, policy, user, etc, management. For example, new products, price changes, promotions, user changes, etc, are done on the central server and then distributed to the local servers so they have the most up to date info.
Centralize all data created at each location to run reports, analytics and warehouse data.
The question of how much data to keep on the local server is debatable. For example some processes are dependent upon not just that one location, like customer loyalty, so a query must be run to the central server to check user activity and determine incentives. On the other hand, active customer base should be within the scope of the local servers data.
I lack experience in these types of distributed systems. My question is what database should we use that will facilitate this type of setup, hopefully incorporating the functionality to work automatically without much coding needed to achieve the data syncs to/from central server.
Master-Slave Replication:
In this type of replication one server (the master) accepts writes and will replicate the changes to read replicas(slaves)
Characteristics
Asynchronous
Read Scalability
Master is a point of failure for all the nodes (SPOF)
Master-Master:
In this setup all the database servers accepts read and writes and synchronize together.
Characteristics
Synchronous(hopefully)
Read and Write Scalability
Performance is worse than Master-Slave
No SPOF
Master-Master is harder to setup and maintain. Possibility of id collisions.
Any Popular Database Server these days supports the features above.
I'm looking for a little advice.
I have some SQL Server tables I need to move to local Access databases for some local production tasks - once per "job" setup, w/400 jobs this qtr, across a dozen users...
A little background:
I am currently using a DSN-less approach to avoid distribution issues
I can create temporary LINKS to the remote tables and run "make table" queries to populate the local tables, then drop the remote tables. Works as expected.
Performance here in US is decent - 10-15 seconds for ~40K records. Our India teams are seeing >5-10 minutes for the same datasets. Their internet connection is decent, not great and a variable I cannot control.
I am wondering if MS Access is adding some overhead here than can be avoided by a more direct approach: i.e., letting the server do all/most of the heavy lifting vs Access?
I've tinkered with various combinations, with no clear improvement or success:
Parameterized stored procedures from Access
SQL Passthru queries from Access
ADO vs DAO
Any suggestions, or an overall approach to suggest? How about moving data as XML?
Note: I have Access 7, 10, 13 users.
Thanks!
It's not entirely clear but if the MSAccess database performing the dump is local and the SQL Server database is remote, across the internet, you are bound to bump into the physical limitations of the connection.
ODBC drivers are not meant to be used for data access beyond a LAN, there is too much latency.
When Access queries data, is doesn't open a stream, it fetches blocks of it, wait for the data wot be downloaded, then request another batch. This is OK on a LAN but quickly degrades over long distances, especially when you consider that communication between the US and India has probably around 200ms latency and you can't do much about it as it adds up very quickly if the communication protocol is chatty, all this on top of the connection's bandwidth that is very likely way below what you would get on a LAN.
The better solution would be to perform the dump locally and then transmit the resulting Access file after it has been compacted and maybe zipped (using 7z for instance for better compression). This would most likely result in very small files that would be easy to move around in a few seconds.
The process could easily be automated. The easiest is maybe to automatically perform this dump every day and making it available on an FTP server or an internal website ready for download.
You can also make it available on demand, maybe trough an app running on a server and made available through RemoteApp using RDP services on a Windows 2008 server or simply though a website, or a shell.
You could also have a simple windows service on your SQL Server that listens to requests for a remote client installed on the local machines everywhere, that would process the dump and sent it to the client which would then unpack it and replace the previously downloaded database.
Plenty of solutions for this, even though they would probably require some amount of work to automate reliably.
One final note: if you automate the data dump from SQL Server to Access, avoid using Access in an automated way. It's hard to debug and quite easy to break. Use an export tool instead that doesn't rely on having Access installed.
Renaud and all, thanks for taking time to provide your responses. As you note, performance across the internet is the bottleneck. The fetching of blocks (vs a continguous DL) of data is exactly what I was hoping to avoid via an alternate approach.
Or workflow is evolving to better leverage both sides of the clock where User1 in US completes their day's efforts in the local DB and then sends JUST their updates back to the server (based on timestamps). User2 in India, also has a local copy of the same DB, grabs just the updated records off the server at the start of his day. So, pretty efficient for day-to-day stuff.
The primary issue is the initial DL of the local DB tables from the server (huge multi-year DB) for the current "job" - should happen just once at the start of the effort (~1 wk long process) This is the piece that takes 5-10 minutes for India to accomplish.
We currently do move the DB back and forth via FTP - DAILY. It is used as a SINGLE shared DB and is a bit LARGE due to temp tables. I was hoping my new timestamped-based push-pull of just the changes daily would have been an overall plus. Seems to be, but the initial DL hurdle remains.
In a Firebird database driven Delphi application we need to bring some data online, so we can add to our application online-reporting capabilities.
Current approach is: whenever data is changed or added send them to the online server(php + mysql), if it fails, add it to the queue and try again. Then the server having the data is able to create it's own reports.
So, to conclude: what is a good way to bring that data online.
At the moment I know these two different strategies:
event based: whenever changes are detected, push them to the web server / mysql db. As you wrote, this requires queueing in case the destination system does not receive the messages.
snapshot based: extract the relevant data in intervals (for example every hour) and transfer it to the web server / mysql db.
The snapshot based strategy allows to preprocess the data in a way that if fits nicely in the wb / mysql db data structure, which can help to decouple the systems better and keep more business logic on the side of the sending system (Delphi). It also generates a more continuous load, as it does not care about mass data changes.
One other way can be to use replication but I don't know system who make replication between Firebird and MySQL database.
For adding reporting tools capability on-line : you can also check fast report server