We provide a critical application for a customer. It's a clickonce winforms application which consumes several WCF services which communicates with an Oracle Database.
The service is hosted with Oracle Application Server with two Web Cache Servers in front for load balancing. The Database is on another separate machine.
Thing is, the application has now poor performance and we need to speed it up. We have tried many techniques: optimize queries with adding indexes when analyzing explain plans, reducing service calls from client and profiling the client application for pitfalls.
But I would really like two set up a caching layer over the database or the WCF. The data is critical and changed quite often so it's necessary to get the latest data at all requests.
So when data changes in the database the cache should immediately be expired. The queries are complex with up two 14-15 joins...
How is the right way to do this and which tools/frameworks should I use? I have heard of memcached.. is this good?
Because your code sees all updates to the data you can have a very effective caching layer as the cache can be updated at the same time as the database.
With your requirement for absolute cache coherency you need to make sure all servers see the same cache. There are two approaches you could take:
Have a cache server which uses something like the ASP.NET cache which the application servers talk to to get and update the data
Use a caching product to maintain the cache
If you use a caching product there are a number on market: memcached, gemfire, coherence, Windows Server AppFabric Caching and more
The nice thing about AppFabric Caching (project formally known as Velocity) is that it is free with Windows Server and is very .NET friendly (although it is newer than some of the others and so you might say less proven)
Before adding a new tool you should make sure you're correctly using all of the Oracle caching that is available to you.
There's the buffer cache, PL/SQL function result cache, client query result cache, sql query result cache, materialized views, and bind variables will help cache query plans.
Related
I have conducted a performance testing a on a e-commerce website hosted on Azure. And I am checking azure logs for the test duration to find some scaling issues. From the logs I saw a lot of "InProc" dependencies failure. Also a lot of "Technical exception" with message " Cart not recalculated for remove shipping methods". So I would like to if this indicates any scaling issues or what should check or scaling issues for example slow database queries. I am very much new in performance testing and Azure so any help will be much appreciated. Thanks!!
Performance can be improved using Cache-Aside pattern- Caching Can Improve Application Performance
Data from a data store can be loaded into a cache on demand. This can assist increase performance while also ensuring data consistency between the cache and the underlying data storage.
Read-through and write-through/write-behind actions are available in
many commercial caching systems. An application in these systems
retrieves data by referring to the cache. If the data isn't already in
the cache, it's fetched and added from the data store. Any changes to
the data in the cache are also automatically pushed back to the data
store.
It is the duty of the programmes that utilise the cache to retain the
data if the cache does not provide this feature.
The cache-aside technique allows an application to mimic the
capabilities of read-through caching. This technique stores data in
the cache only when it is needed. The following diagram shows how to
use the Cache.
I have 2 web servers, and I'm running into an issue where I need to prematurely expire (remove) a cached item. Since I'm currently using IMemoryCache, a Remove(key) call only removes the cached item from one server. I don't have the ability to leverage Redis, Nache, etc. but the app is already using SQL server. I can easily set up distributed caching with a cache table, but it seems counter-intuitive because what I'm caching is user data that I don't want to hit the database for on every call (e.g., I cache 50 items of user data every 5 minutes which has cut down on 500 trips to the database). Is there something I'm missing which would make using SQL server as my distributed cache backend actually beneficial?
Sounds like you are having the typical problem of cache invalidation and expiry. You can use a grid-cache for distributed caching (e.g. Redis, Hazelcast) but it doesn't solve the invalidation problem. You may want to consider vendors like ScaleArc or Heimdall Data. They provide the caching logic. You choose the storage of choice (in-memory, Redis etc.) and it handles query caching and invalidation. The is SQL Server blog on it: https://www.itprotoday.com/industry-perspectives/reduce-sql-server-costs-heimdall-data-caching
My team has run into a design conflict. We are working on a project that involves scraping historical data from yahoo for all stocks for the last year to run some ML analysis on it. The latency is unbearably slow, not sure if it's the network or the web scraper. I proposed we use AWS RDS to store the data so we can access it quicker. However, a team member said that storing the data in the cloud would not solve our latency issue. I rebutted with the fact that the data will be organized and stored in a way to access the data significantly faster. He came back with something else and this went on. Is it true that a cloud DB won't offer any additional speed compared to a scraper? If so does AWS have a service that allows us to access the data we store faster through another service, almost as if the database was on our own server?
I am not that all familiar with cloud services but I do understand databases pretty well. So please dumb down the AWS stuff if you wish and feel free to point me to any duplicates or links that may help me understand this more.
Lots of good reasons to use RDS as a database, but speeding up your scraping isn't one of them - it likely isn't your bottleneck.
I have written lots of scrapers over the years, and by far the biggest performance boost will be to have a fast network connection between the scraper machine(s) and the host you are scraping, and even then, using a multi-threaded scraper for each scraping machine will give you another HUGE speed improvement.
Most time spent scraping is waiting on the host to return the results to you, not parsing the page and not saving the database to a database.
A MySQL DB on AWS RDS would be the same as the one that you'd install yourself on some machine. So, it isn't going to be different or slower just because it is in the cloud.
If you scrape some data and process it only once, then there is no point in introducing a DB in between. But if your scraper is slow and you process scraped data multiple times, then storing it in a DB should improve latencies. That is because the latencies of a DB read will be much lesser than that of scraping (assuming you design your DB schema properly; your hosts are in the same availability zones, or at least regions, as your DB etc.).
For e.g., if scraping a webpage takes ~10s and you process the scraped data twice, it'd take you ~20s if you don't have a DB. If you have a DB which has latencies of ~500ms you'd only take ~11s.
I have an application that calls a third party product. This product makes a large number of database calls as part of its processing, although the data in the database is generally static (unless the data is updated maybe once per day by a refresh mechanism).
As the data is very static, but the database is under a heavy load, I want to put a caching layer inbetween the application and the database. As the application is a third party product (written in C), I can't implement the cache in code myself. So...
is there a product, or tool that can transparently sit in between the application and the database that can act as a cache. Something that can intercept the requests, respond with cached data if it has it, or send it down to the database if it does not.
I know that databases have their caches, but I am looking to offload the workload to the application's server rather than the ratabase server to reduce the workload going on the DB server (which has significant licence costs).
Has anyone had any experience scaling out SQL Server in a multi reader single writer fashion. If not can anyone suggest a suitable alternative for a read intensive web application, that they have experience with
It depends on probably 2 things:
How big each single write is?
Do readers need real time data?
A write will block readers when writing, but if each write is small and fast then readers won't notice.
If you offload, say, end of day reporting then you batch your load onto a separate server because readers do not require real time data. This makes sense
A write on your primary server must be synched to your offload secondary server... which will block there as part of the synch process anyway + you add an overhead load to manage the synch.
Most apps are 95%+ read anyway all the time. For example, an update or delete is a read followed by a write.
My choice would be (probably, based on the low write volume and it's a web app) to scale up and stuff as much RAM as I could in the DB server with separate disk paths for the data and log files of the database.
I don't have any experience with scaling out SQL Server for your scenario.
However for a Read-Intensive application, I would be looking at reducing the load on the database and employ a Cache Strategy using something like Memcache or MS Velocity
There are two approaches that I'm aware of:
Have the entire database loaded into the Cache and manage Adding and Updating of items in the cache.
Add items to the cache only when they are requested and remove them when a write operation is performed.
Some kind of replication would do the trick.
http://msdn.microsoft.com/en-us/library/ms151827.aspx
You of course need to change your app code.
Some people use partitioned tables, with different row ranges being stored on different servers - united with views. This would be invisible to the app. Federation for this practice, I think.
By designing your database, application and server configuration (SQL particulars - location of data/log/system/sql binaries/tempdb), you should be able to handle a pretty good load. Try not to complicate things if you don't have to.