Requirement: Access very large SQL database during business hours to pull data every couple of hours / every 4 hours for latest data. Then store it in local database. Another tool will access that local database for further process.
Question: What is best option to access busiest database without impacting performance. It requires only select statement and may join few tables for selected columns. But we need to connect that busiest database every 2 - 4 hrs to pull latest data.
Related
I was reviewing the oracle setup at our place . Have few doubts on best practices. Using oracle 12 c and planning to move to 19 by EOY. Open to answers on both versions if there is a difference.
In a microservice architecture we have 10 apps each having 10 tables they interact with. Is it better to have 10 different DB or 10 different users/schemas in same DB.
All the tables in total have 31Tb of data. It's mentioned that oracle 12c Bigfile can only grow till 32Tb. Is that a true limitation of oracle that it can't grow further?
For tablespace currently , all the objects are saved in 1 tablespace and indexes are saved in 2nd tablespace. Is that a good strategy. Is there something better like having different tablespace for different users linked to different microservices or things like save clob objects in 1 tablespace and rest in another tablespace.
Is there an out of box purge or archive solution. I have seen row level archival option which basically turns a flag on or off. Ideally would like a functionality where every weekend a year old data gets purged or archived automatically.
There is a table with orders which need to be fulfilled in next 3 months and then once they are past the delivery date they remain in the table for select and no updates happen on them. I was thinking to apply partitions to such a table. There is 100 Gb of data each in such tables. Will applying partitions help? What kind of strategy will work best for this usecase.
We have a database that is not performing well and I am hoping to get some advice on the best way to re-design it. The database/application comes from a third-party vendor, and for the moment cannot be changed.
Currently we have a local distributor set up to serve about 80,000 reports per month of different complexities (I know - how complex or simple is each one - the number is more by way of indication than an actual load assessment). We are pulling data from a number of different real time (x4) and transactional databases (x3) across a WAN on a minute by minute basis and then transforming that data into the schema. We have dashboards (.NET installed client) and MSRS reporting. There is also some minor data entry.
As you may have guessed, the server is struggling.
We are looking to move to SQL Server 2014.
There are two options we are considering:
AG separation of Primary and active Secondary.
Splitting Publisher and Distributor and using some form of replication (Transactional?) to push the data to the distributors.
Which would make more sense?
Also, each object on every dashboard calls its own query. If 10 people from 3 different geographic locations are running the same dashboard, they will each be running the same query, and these will refresh every 2 mins.
I have two databases - a CRM database (Microsoft Dynamics crm) and a company database.
These two databases are different.
How to copy the company database (all objects) into CRM database every 5 seconds?
Thanks
The cheapest way to do this (and one of the easiest) is to use a method called log shipping. This can (on a schedule even every 5 minutes or so) copy the log file to another machine and from the shipped log file restore to the target data base. Please ignore geniuses that will claim it can be done every minute because it takes a little while to close the log backup file, move it and reapply but a 5-10 minute window is achievable.
You can also use mirroring,transactional replication, and other High Availability solutions but there is no easy way to keep two machines in sync.
Do you need to duplicate the data? Can;t you query the source system directly if they're on the same server?
Else this might point you in the right direction: Keep two databases synchronized with timestamp / rowversion
Background:
I am developing an application that allows users to generate lots of different reports. The data is stored in PostgreSQL and has natural unique group key, so that the data with one group key is totally independent from the data with others group key. Reports are built only using 1 group key at a time, so all of the queries uses "WHERE groupKey = X;" clause. The data in PostgreSQL updates intensively via parallel processes which adds data into different groups, but I don't need a realtime report. The one update per 30 minutes is fine.
Problem:
There are about 4 gigs of data already and I found that some reports takes significant time to generate (up to 15 seconds), because they need to query not a single table but 3-4 of them.
What I want to do is to reduce the time it takes to create a report without significantly changing the technologies or schemes of the solution.
Possible solutions
What I was thinking about this is:
Splitting one database into several databases for 1 database per each group key. Then I will get rid of WHERE groupKey = X (though I have index on that column in each table) and the number of rows to process each time would be significantly less.
Creating the slave database for reads only. Then I will have to sync the data with replication mechanism of PostgreSQL for example once per 15 minutes (Can I actually do that? Or I have to write custom code)
I don't want to change the database to NoSQL because I will have to rewrite all sql queries and I don't want to. I might switch to another SQL database with column store support if it is free and runs on Windows (sorry, don't have Linux server but might have one if I have to).
Your ideas
What would you recommend as the first simple steps?
Two thoughts immediately come to mind for reporting:
1). Set up some summary (aka "aggregate") tables that are precomputed results of the queries that your users are likely to run. Eg. A table containing the counts and sums grouped by the various dimensions. This can be an automated process -- a db function (or script) gets run via your job scheduler of choice -- that refreshes the data every N minutes.
2). Regarding replication, if you are using Streaming Replication (PostgreSQL 9+), the changes in the master db are replicated to the slave databases (hot standby = read only) for reporting.
Tune the report query. Use explain. Avoid procedure when you could do it in pure sql.
Tune the server; memory, disk, processor. Take a look at server config.
Upgrade postgres version.
Do vacuum.
Out of 4, only 1 will require significant changes in the application.
I'm current on POS project. User require this application can work both online and offline which mean they need local database. I decide to use SQL Server replication between each shop and head office. Each shop need to install SQL Server Express and head office already has SQL Server Enterprise Edition. Replication will run every 30 minutes as schedule and I choose Merge Replication because data can change at both shop and head office.
When I'm doing POC, I found this solution not work properly, sometime job is error and I need to re-initialize its. This solution also take a very long time, which obviously unacceptable to user.
I want to know, are there any solutions better than one that I'm doing now?
Update 1:
Constraints of the system are
Almost of transactions can occur at
both shop and head office.
Some transaction need to work in real-time mode, that being said,
after user save data to their local shop that data should go to update at head office too. (If they're currently online)
User can working even their shop has disconnected from head office database.
Our estimation about amount of data is at-most 2,000 rows in each day.
Windows 2003 is OS of Server at head office and Windows XP is OS of all clients.
Update 2:
Currently they're about 15 clients, but this number will growing in fairly slow rate.
Data's size is about 100 to 200 rows per replication, I think it may not more than 5 MB.
Client connect to server by lease-line connection; 128 kbps.
I'm in situation that replication take a very long time (about 55 minutes while we've only 5 minutes or so) and almost of times I need to re-initialize job to start replicate again, if I don't re-initialize job, it can't replicate at all. In my POC, I find that it always take very long time to replicate after re-initialize, amount of time doesn't depend on amount of data. By the way, re-initialize is only solution I find it work for my problem.
As above, I conclude that, replication may not suitable for my problem and I think it may has another better solution that can serve what I need in Update 1:
Sounds like you may need to roll your own bi-directional replication engine.
Part of the reason things take so long is that over such a narrow link (128kbps), the two databases have to be consistent (so they need to check all rows) before replication can start. As you can imagine, this can (and does) take a long time. Even 5Mb would take around a minute to transfer over this link.
When writing your own engine, decide what needs to be replicated (using timestamps for when items changed), figure out conflict resolution (what happens if the same record changed in both places between replication periods) and more. This is not easy.
My suggestion is to use MS access locally and keep updating data to the server after a certain interval. Add a updated column to every table. When a record is added or updated, set the updated coloumn. For deletion you need to have a seprate table where you can put primary key value and table name. When synchronizing fetch all local records whose updated field not set and update (modify or insert) it to central server. Delete all records using local deleted table and you are done!
I assume that your central server is only for collecting data.
I currently do exactly what you describe using SQL Server Merge Replication configured for Web Synchronization. I have my agents run on a 1-minute schedule and have had success.
What kind of error messages are you seeing?