Business Objects 4: decrease the refresh time - query-optimization

I have a report with a refresh time of about 1 minute, that I reduced from 5 minutes by doing the following:
Displaying fewer elements (e.g. tables, charts)
Making the formulae as simple as possible (e.g. simplifying nested if statements)
Making the queries simpler (e.g. pulling through fewer columns)
My question is, are there any other ways for me to reduce refresh time that I haven't considered?

Talking round the office shows that the methods I used were optimal from my side. Slow refresh times were caused by systemic under investment by my employer, e.g.
Universes being clunking and indexed incorrectly
Slow internet connection/being restricted to IE
Slow laptop to access BO4
This lead me to build a business case to invest in the infrastructure by leveraging the amount of time lost in waiting for reports to refresh.

Related

snowsight page is displayed too slowly when the many queries is thrown

I'm using a snowflake trial version to do a performance test.
I perform 9 heavy queries (20 mins taken by XS cluster) at the same time and watch the warehouse or history pane. However, the time to display page is too much; about 30 seconds.
I think the cloudservice (like hadoop headnode?) doesn't have adequate resources to do this.
Is it because I'm using the trial version? If I use enterprise or business critical versions, will it happen?
The "cloud services", is actually unrelated to your warehouses, and 9 queries is not enough overload that. But at the same time the trial accounts might be on slightly underpowered SC layer, but I am not sure why that would be the case. The allocated credit spend is just that.
I am puzzled what you are trying to "test" about running many slow for the server sized queries at the same time?
When you say the page takes 30 seconds to load, do you mean if you do nothing the query execution status/time is only updated every ~30 seconds, or if you do a full page reload it is blank for 30 seconds?

Rule of thumb to begin optimizing a page's database queries

I'm implementing database profiling in a website that will definitely start seeing a measured increase in growth over the next year. I'm implementing query profiling on each page (using Zend) and am going to log issues when a page gets too slow. At that point, I'll see what I can do to optimize the queries. The problem, is that without any experience with scaling a website, I'm not sure what "too slow" would be for the queries on a given page. Is there any accepted time-limit for the queries on a given page before one should look for ways to optimize the queries?
Thanks,
Eric
There's no global "too slow". Everything depends on what the queries do and what's your traffic like. Invest some time in writing scenarios for a traffic generator and just load-test your website. Check which parts break first, fix them and repeat. Even the simple queries can hit some pathological cases.
Don't forget to load more fake data into the database too - more users are likely to generate more data for you and some problems may start only when the dataset is larger than your database caching/buffers. Make sure you're blaming the right queries too - if you have something locking the tables for update, other transactions may need retries / get delayed - look at the top N queries instead of fixating on one single query.
Make sure you look at the queries from both sides too - from the client and the server. If you're using mysql for example, you can easily log all queries which don't use indexes for joins / searches. You can also use percona toolkit (previously Maatkit) to grab the traffic off the network and analyse that instead. You can use mysqltunner to see how many cache misses you experience. For other databases, you can find similar tools elsewhere.
If there is any general rule, I'd say - if your queries start taking 10x the time they took without any other load, you've got a problem. Also, it's not about queries - it's about page load time. Find an answer to "how long should the page generation take?" and go from there. (probably less than a second unless you do heavy data processing under the covers)

Strategy for caching of remote service; what should I be considering?

My web app contains data gathered from an external API of which I do not have control. I'm limited to about 20,000 API requests per hour. I have about 250,000 items in my database. Each of these items is essentially a cached version. Consider that it takes 1 request to update the cache of 1 item. Obviously, it is not possible to have a perfectly up-to-date cache under these circumstances. So, what things should I be considering when developing a strategy for caching the data. These are the things that come to mind, but I'm hoping someone has some good ideas I haven't thought of.
time since item was created (less time means more important)
number of 'likes' a particular item has (could mean higher probability of being viewed)
time since last updated
A few more details: the items are photos. Every photo belongs to an event. Events that are currently occurring are more like to be viewed by client (therefore they should take priority). Though I only have 250K items in database now, that number increases rather rapidly (it will not be long until 1 million mark is reached, maybe 5 months).
Would http://instagram.com/developer/realtime/ be any use? It appears that Instagram is willing to POST to your server when there's new (and maybe updated?) images for you to check out. Would that do the trick?
Otherwise, I think your problem sounds much like the problem any search engine has—have you seen Wikipedia on crawler selection criteria? You're dealing with many of the problems faced by web crawlers: what to crawl, how often to crawl it, and how to avoid making too many requests to an individual site. You might also look at open-source crawlers (on the same page) for code and algorithms you might be able to study.
Anyway, to throw out some thoughts on standards for crawling:
Update the things that have changed often when updated. So, if an item hasn't changed in the last five updates, then maybe you could assume it won't change as often and update it less.
Create a score for each image, and update the ones with the highest scores. Or the lowest scores (depending on what kind of score you're using). This is a similar thought to what is used by LilyPond to typeset music. Some ways to create input for such a score:
A statistical model of the chance of an image being updated and needing to be recached.
An importance score for each image, using things like the recency of the image, or the currency of its event.
Update things that are being viewed frequently.
Update things that have many views.
Does time affect the probability that an image will be updated? You mentioned that newer images are more important, but what about the probability of changes on older ones? Slow down the frequency of checks of older images.
Allocate part of your requests to slowly updating everything, and split up other parts to process results from several different algorithms simultaneously. So, for example, have the following (numbers are for show/example only--I just pulled them out of a hat):
5,000 requests per hour churning through the complete contents of the database (provided they've not been updated since the last time that crawler came through)
2,500 requests processing new images (which you mentioned are more important)
2,500 requests processing images of current events
2,500 requests processing images that are in the top 15,000 most viewed (as long as there has been a change in the last 5 checks of that image, otherwise, check it on a decreasing schedule)
2,500 requests processing images that have been viewed at least
Total: 15,000 requests per hour.
How many (unique) photos / events are viewed on your site per hour? Those photos that are not viewed probably don't need to be updated often. Do you see any patterns in views for old events / phones? Old events might not be as popular so perhaps they don't have to be checked that often.
andyg0808 has good detailed information however it is important to know the patterns of your data usage before applying in practice.
At some point you will find that 20,000 API requests per hour will not be enough to update frequently viewed photos, which might lead you to different questions as well.

How to decrease the response time when dealing with SQL Server remotely?

I have created a vb.net application that uses a SQL Server database at a remote location over the internet.
There are 10 vb.net clients that are working on the same time.
The problem is in the delay time that happens when inserting a new row or retrieving rows from the database, the form appears to be freezing for a while when it deals with the database, I don't want to use a background worker to overcome the freeze problem.
I want to eliminate that delay time and decrease it as much as possible
Any tips, advises or information are welcomed, thanks in advance
Well, 2 problems:
The form appears to be freezing for a while when it deals with the database, I don't want to use a background worker
to overcome the freeze problem.
Vanity, arroaance and reality rarely mix. ANY operation that takes more than a SHORT time (0.1-0.5 seconds) SHOULD run async, only way to kep the UI responsive. Regardless what the issue is, if that CAN take longer of is on an internet app, decouple them.
But:
The problem is in the delay time that happens when inserting a new records or retrieving records from the database,
So, what IS The problem? Seriously. Is this a latency problem (too many round trips, work on more efficient sql, batch, so not send 20 q1uestions waiting for a result after each) or is the server overlaoded - it is not clear from the question whether this really is a latency issue.
At the end:
I want to eliminate that delay time
Pray to whatever god you believe in to change the rules of physics (mostly the speed of light) or to your local physician tof finally get quantum teleportation workable for a low cost. Packets take time at the moment to travel, no way to change that.
Check whether you use too many ound trips. NEVER (!) use sql server remotely with SQL - put in a web service and make it fitting the application, possibly even down to a 1:1 match to your screens, so you can ask for data and send updates in ONE round trip, not a dozen. WHen we did something similar 12 years ago with our custom ORM in .NET we used a data access layer for that that acepted multiple queries in one run and retuend multiple result sets for them - so a form with 10 drop downs could ask for all 10 data sets in ONE round trip. If a request takes 0.1 seconds internet time - then this saves 0.9 seconds. We had a form with about 100 (!) round trips (creating a tree) and got that down to less than 5 - talk of "takes time" to "whow, there". Plus it WAS async, sorry.
Then realize moving a lot of data is SLOW unless you have instant high bandwidth connections.
THis is exaclty what async is done for - if you have transfer time or latency time issues that can not be optimized, and do not want to use async, go on delivering a crappy experience.
You can execute the SQL call asynchronously and let Microsoft deal with the background process.
http://msdn.microsoft.com/en-us/library/7szdt0kc.aspx
Please note, this does not decrease the response time from the SQL server, for that you'll have to try to improve your network speed or increase the performance of your SQL statements.
There are a few things you could potentially do to speed things up, however it is difficult to say without seeing the code.
If you are using generic inserts - start using stored procedures
If you are closing the connection after every command then... well dont. Establishing a connection is typically one of the more 'expensive' operations
Increase the pipe between the two.
Add an index
Investigate your SQL Server perhaps it not setup in a preferred manner.

storing + evaluating performance data

since we suffer from creeping degradation in our web application we decided to monitor our application performance and measure individual actions.
for example we will measure the duration of each request, the duration of individual actions like editing a customer or creating an appointment, searching for a contract.
in most cases the database is the bottleneck for these actions.
i expect that the culminated data will be quite large, since we will gather 1-5 individual actions per request.
of course it would be nonsense to insert each an every element to the database, since this would slow down every request even more.
what is a good strategy for storing and evaluating those per-request data.
i thought about having a global Queue object which is appended and a seperate thread that empties the queue and handles the persistent storage/file. but where to store such data? are there any prebuilt tools for such a visualisation?
we use java, spring, mixed hibernate+jdbc+pl/sql, oracle.
the question should be language-agnostic, though.
edit: the measurement will be taken in production over a large period of time.
It seems like your archive strategy will be at least partially dependent on the scope of your tests:
How long do you intend to collect performance data?
What are you trying to demonstrate? Performance improvements over time? Improvements associated with specific changes? (Like perf issues for a specific set of releases)
As for visualization tools, I've found excel to be pretty useful for small to moderate amounts of data.

Resources