We have a database that is not performing well and I am hoping to get some advice on the best way to re-design it. The database/application comes from a third-party vendor, and for the moment cannot be changed.
Currently we have a local distributor set up to serve about 80,000 reports per month of different complexities (I know - how complex or simple is each one - the number is more by way of indication than an actual load assessment). We are pulling data from a number of different real time (x4) and transactional databases (x3) across a WAN on a minute by minute basis and then transforming that data into the schema. We have dashboards (.NET installed client) and MSRS reporting. There is also some minor data entry.
As you may have guessed, the server is struggling.
We are looking to move to SQL Server 2014.
There are two options we are considering:
AG separation of Primary and active Secondary.
Splitting Publisher and Distributor and using some form of replication (Transactional?) to push the data to the distributors.
Which would make more sense?
Also, each object on every dashboard calls its own query. If 10 people from 3 different geographic locations are running the same dashboard, they will each be running the same query, and these will refresh every 2 mins.
Related
Looking for suggesting on loading data from SQL Server into Elasticsearch or any other data store. The goal is to have transactional data available in real time for Reporting.
We currently use a 3rd party tool, in addition to SSRS, for data analytics. The data transfer is done using daily batch jobs and as a result, there is a 24 hour data latency.
We are looking to build something out that would allow for more real time availability of the data, similar to SSRS, for our Clients to report on. We need to ensure that this does not have an impact on our SQL Server database.
My initial thought was to do a full dump of the data, during the weekend, and writes, in real time, during weekdays.
Thanks.
ElasticSearch's main use cases are for providing search type capabilities on top of unstructured large text based data. For example, if you were ingesting large batches of emails into your data store every day, ElasticSearch is a good tool to parse out pieces of those emails based on rules you setup with it to enable searching (and to some degree querying) capability of those email messages.
If your data is already in SQL Server, it sounds like it's structured already and therefore there's not much gained from ElasticSearch in terms of reportability and availability. Rather you'd likely be introducing extra complexity to your data workflow.
If you have structured data in SQL Server already, and you are experiencing issues with reporting directly off of it, you should look to building a data warehouse instead to handle your reporting. SQL Server comes with a number of features out of the box to help you replicate your data for this very purpose. The three main features to accomplish this that you could look into are AlwaysOn Availability Groups, Replication, or SSIS.
Each option above (in addition to other out-of-the-box features of SQL Server) have different pros and drawbacks. For example, AlwaysOn Availability Groups are very easy to setup and offer the ability to automatically failover if your main server had an outage, but they clone the entire database to a replica. Replication let's you more granularly choose to only copy specific Tables and Views, but then you can't as easily failover if your main server has an outage. So you should read up on all three options and understand their differences.
Additionally, if you're having specific performance problems trying to report off of the main database, you may want to dig into the root cause of those problems first before looking into replicating your data as a solution for reporting (although it's a fairly common solution). You may find that a simple architectural change like using a columnstore index on the correct Table will improve your reporting capabilities immensely.
I've been down both pathways of implementing ElasticSearch and a data warehouse using all three of the main data synchronization features above, for structured data and unstructured large text data, and have experienced the proper use cases for both. One data warehouse I've managed in the past had Tables with billions of rows in it (each Table terabytes big), and it was highly performant for reporting off of on fairly modest hardware in AWS (we weren't even using Redshift).
I have an ASP.NET MVC 4 application that uses a database in US or Canada, depending on which website you are on.
This program lets you filter job data on various filters and the criteria gets translated to a SQL query with a good number of table joins. The data is filtered then grouped/aggregated.
However, now I have a new requirement: query and do some grouping and aggregation (avg salary) on data in both the Canada Server and US server.
Right now, the lookup tables are duplicated on both database servers.
Here's the approach I was thinking:
Run the query on the US server, run the query again on the Canada server and then merge the data in memory.
Here is one use case: rank the companies by average salary.
In terms of the logic, I am just filtering and querying a job table and grouping the results by company and average salary.
Would are some other ways to do this? I was thinking of populating a reporting view table with a nightly job and running the queries against that reporting table.
To be honest, the queries themselves are not that fast to begin with; running, the query again against the Canada database seems like it would make the site much slower.
Any ideas?
Quite a number of variables here. If you don't have too much data then doing the queries on each DB and merging is fine so long as you get the database to do as much of the work as it is able to (i.e. the grouping, averaging etc.).
Other options include linking your databases and doing a single query but there are a few downsides to this including
Having to link databases
Security associated with a linked database
A single query will require both databases to be online, whereas you can most likely work around that with two queries
Scheduled, prebuilt tables have some advantages & disadvantages but probably not really relevant to the root problem of you having 2 databases where perhaps you should have one (maybe, maybe not).
If the query is quite slow and called many times, the a single snapshot once could save you some resources provided the data "as at" the time of the snapshot is relevant and useful to your business need.
A hybrid is to create an "Indexed View" which can let the DB create a running average for you. That should be fast to query and relatively unobtrusive to keep up to date.
Hope some of that helps.
We have an application which needs to interact with 3 different databases
(SQL Server) to fetch the user details and display them on a web page. Is it a good option to use a linked server or should we copy the user details (via some daily job) to the application database?
Using a linked server will give you a round trip delay every time you query the data. If you only query the data once per day or per session this might be acceptable. If however you are issuing many queries to these servers you may find that the performance is so poor that your application is unusable.
You could use SQL replication to push (or pull) the data from each of the servers into a local copy on the application server. This will provide you with much better query performance (no round trip delay) while also ensuring that you have the latest data. There are lots of options with SQL replication you should be able to find something that suits your needs.
For more information on SQL Replication see http://technet.microsoft.com/en-us/library/ms151198.aspx
A linked server is only going to allow your databases to talk to each other. If the application is interacting with three discrete databases, then you simply need discrete connections. I would not recommend heavily using the linked servers unless you are moving a lot of data (since picking it up into the application and putting it into another database may take even longer).
I'm current on POS project. User require this application can work both online and offline which mean they need local database. I decide to use SQL Server replication between each shop and head office. Each shop need to install SQL Server Express and head office already has SQL Server Enterprise Edition. Replication will run every 30 minutes as schedule and I choose Merge Replication because data can change at both shop and head office.
When I'm doing POC, I found this solution not work properly, sometime job is error and I need to re-initialize its. This solution also take a very long time, which obviously unacceptable to user.
I want to know, are there any solutions better than one that I'm doing now?
Update 1:
Constraints of the system are
Almost of transactions can occur at
both shop and head office.
Some transaction need to work in real-time mode, that being said,
after user save data to their local shop that data should go to update at head office too. (If they're currently online)
User can working even their shop has disconnected from head office database.
Our estimation about amount of data is at-most 2,000 rows in each day.
Windows 2003 is OS of Server at head office and Windows XP is OS of all clients.
Update 2:
Currently they're about 15 clients, but this number will growing in fairly slow rate.
Data's size is about 100 to 200 rows per replication, I think it may not more than 5 MB.
Client connect to server by lease-line connection; 128 kbps.
I'm in situation that replication take a very long time (about 55 minutes while we've only 5 minutes or so) and almost of times I need to re-initialize job to start replicate again, if I don't re-initialize job, it can't replicate at all. In my POC, I find that it always take very long time to replicate after re-initialize, amount of time doesn't depend on amount of data. By the way, re-initialize is only solution I find it work for my problem.
As above, I conclude that, replication may not suitable for my problem and I think it may has another better solution that can serve what I need in Update 1:
Sounds like you may need to roll your own bi-directional replication engine.
Part of the reason things take so long is that over such a narrow link (128kbps), the two databases have to be consistent (so they need to check all rows) before replication can start. As you can imagine, this can (and does) take a long time. Even 5Mb would take around a minute to transfer over this link.
When writing your own engine, decide what needs to be replicated (using timestamps for when items changed), figure out conflict resolution (what happens if the same record changed in both places between replication periods) and more. This is not easy.
My suggestion is to use MS access locally and keep updating data to the server after a certain interval. Add a updated column to every table. When a record is added or updated, set the updated coloumn. For deletion you need to have a seprate table where you can put primary key value and table name. When synchronizing fetch all local records whose updated field not set and update (modify or insert) it to central server. Delete all records using local deleted table and you are done!
I assume that your central server is only for collecting data.
I currently do exactly what you describe using SQL Server Merge Replication configured for Web Synchronization. I have my agents run on a 1-minute schedule and have had success.
What kind of error messages are you seeing?
I need to implement a SQL Server replication solution. Very simple need for now. I just need to replicate one pretty simple table from 200 remote sites or so to one central server. The data is not really transactional in nature. I just need it moved up to the central server once a day. I can't decide if I should use push or pull, and I'm not sure if the distributor should live on the server side, or on all the clients.
The server and all the remote sites all live on a fairly decent VPN. The server is 2005, and it's not being pushed very hard at the moment. Just a few jobs here and there collecting data (which I want to get away from) and pushing reports/exports to various vendors once a day. The sites are a mix of 2000/2005.
I'd recommend you do some scalability tests first. Replication is very verbose in terms of agent jobs and T-SQL connections for reading and writing data. 200 publications you're talking 200 publisher agents, 200 subscription agents, plus the distributor maintenance. Most sites complain about maintenance problems of having 1 publisher and 1 subscriber... Say you manage to pull this off and operate it successfully, what is going to be your upgrade story? And how are you going to implement a schema change?
The largest replication deployment I heard of (some years ago) had I believe 450 publishers and was implemented by an army of Microsoft field consultants sweating for months to bend the behemoth into shape. Your 200 replication sites project is way more ambitious than you realize.
I suggest you explore some alternatives too. If you need a periodic table snapshot then SSIS can be a good match. If you need a continuous stream of changes then Service Broker can scale way way easier than replication.
If there is need to adjust the replication down the road, having the central server initiate a pull will be much easier to administrate than adjusting 200 sites to accomplish the same thing. Also, that would naturally manage the load, rather than some scheme to prevent, say, 100 remote sites all connecting at once.
Push subscriptions are the way to go here if you wish to centrally manage the data distribution of your application platform.
From what you have described you will need to make a choice between Snapshot Replication and Transactional Replication for your architecture.
Dependent on how much data you are looking to push and also the schedule of your updates will determine the most appropriate Replication Method for you to use. For example, if you looking to update all Subscriptions at the same time then dependent on how much data you need to push Snapshot Replication may not be suitable and you may be better off using Transactional Replication, perhaps pushed at specific determined intervals. Your network may even be able to support near real-time replication however conducting a small test of your environment will determine this for you. For example, setup the Publisher, local Distributor and a handful of Subscribers at geographically different locations on your network in order to test network transfer times and Replication Latency.
Things to consider:
How much data is to be moved across
the network? Size in Kb and record
volume.
Consider the physical location of
your sites
What is the suitability of your
network? Seed, capacity etc.
You may wish to consider using a
dedicated Distributor.