ESSBASE running long - cube

I am new to ESSBASE so I apologize for the newbie questions I will be asking.
We have an ESSBASE Cube that was running fine last week. Average runtime is around 5-6hrs. However, this week it suddenly soared to 11 hrs. Double the previous average runtime. One of the things that we observed is that the SQLs being fired are in INACTIVE state but progressing. Our database admin team have confirmed that only 6% of the total runtime of the cube is spent in DB and the 94% are somewhere else. We are greatly concerned because the cube data is needed by users at an early time in the morning and since it has been completing late for this week, the data they needed are being delayed.
What are the things that we can check?
Hoping to hear from anyone soon. Thanks in advance.

Check your log files for the server/cube. You should be able to see entries that tell you how much time is spent on the data load, the calc(s) and other things. It sounds like the cube is spending a fair bit of time in the calculating state but the logs will confirm this.

Related

AWS Lightsail Metric graphs "No data available"

We're using AWS Lightsail PostgreSQL Database. We've been experiencing errors with our C# application timing out when using the connection to database. As I'm trying to debug the issue, I went to look at the Metric graphs in AWS. I noticed that many of the graphs have frequent gaps in the data, labeled No data available. See image below.
This graph (and most of the other metrics) shows frequent gaps in the data. I'm trying to understand if this is normal, or could be a symptom of the problem. If I go back to 2 weeks timescale, there does not appear to be any other strange behaviors in any of the metric data. For example, I do not see a point in time in the past where the CPU or memory usage went crazy. The issue started happening about a week ago, so I was hoping the metrics would have helped explained why the connections to the PostgreSQL database are failing from C#.
🔶 So I guess my question is, are those frequent gaps of No data available normal for a AWS Lightsail Postgres Database?
Other Data about the machine:
1 GB RAM, 1 vCPU, 40 GB SSD
PostgreSQL database (12.11)
In the last two weeks (the average metrics show):
CPU utilization has never gone over 20%
Database connections have never gone over 35 (usually less than 5) (actually, usually 0)
Disk queue depth never goes over 0.2
Free storage space hovers around 36.5 GB
Network receive throughput is mostly less than 1 kB/s (with one spike to 141kB/s)
Network transmit throughput is mostly less than 11kB/s with all spikes less than 11.5kB/s
I would love to view the AWS logs, but they are a month old, and when trying to view them they are filled with checkpoint starting/complete logs. They start at one month ago and each page update only takes me 2 hours forward in time (and taking ~6 seconds to fetch the logs). This would require me to do ~360 page updates, and when trying, my auth timed out. 😢
So we never figured out the reason why, but this seems like it was a problem with the AWS LightSail DB. We ended up using a snapshot to create a new clone of the DB, and wiring the C# servers to the new DB. The latency issues we were having disappeared and the metric graphs looked normal (without the strange gaps).
I wish we were able to figure out the root of the problem. ATM, we are just hoping the problem does not return.
When in doubt, clone everything! 🙃

Interpreting cost data in Google Cloud Platform

I host a basic web app on Google Cloud Platform, and I've noticed my costs creeping up over the last couple of months. It's really accelerated over the last 30 days (fortunately, on a tiny base - I'm still ticking along at under $2 a day). I haven't added any new functionality or clients in months so this was a bit surprising.
My first instinct was an increase in traffic. I couldn't see anything like that in the App Engine dashboard, but I put in a heap of optimizations and dramatically decreased QPS just in case. No change.
The number of instances hasn't moved around much either - this looks like the most likely culprit but it's still just flat, not growing.
My next guess was that data was accumulating in Datastore (even though the cost chart is filtered to App Engine only, I figured a fuller datastore -> a slower datastore -> more instance time in GAE). There's no chart for this, annoyingly, but I determined the data store size was more or less flat (I have a blunt instrument TTL job that runs daily) and culled it by dropping my retention threshold by 20% just to be safe.
These optimizations were on the 17th, but my cost hasn't moved at all. I considered forex fluctuations (I'm billed in Aussie dollars, all my charges are for frontend instances in Japan) but they haven't been anywhere near big enough to explain this.
Any ideas what's going on? I've clicked through all the graphs and reports in billing but can't reconcile the ~100% growth in cost with a flat or dropping qps, instance count and database size.
Yes! I've seen the same thing on a simple App Engine website running Python 3.7! I've had a ticket open since April 29th and they're not helpful. I saw a step change in frontend instance hours on March 24th with no corresponding increase in traffic. I have screenshots that are really telling but I can't upload them since I don't have 10 reputation points.
There's no corresponding increase in traffic, either in the cloud console or in Google analytics.
What's worse, each day the daily estimate shows I'm be under the 28 hour quota. For example, I took a screenshot that showed after 15 hours I was on pace for 24.352 frontend instance hours for the day (I didn't take one at the end of the quota day since it resets at 3AM)
When I woke up the next morning the billing report showed I was charged $0.00 for frontend instance hours for the previous day, but 3 hours later it shot up to $0.48, which means I used 38.6 frontend instance hours worth.
Somehow, the estimated cost calculation was off by 14 hours. Why have the estimate at all if it has an error that large? When I looked at the minute-by-minute billed instance hours for the hours after taking the screenshot through the end of the quota-day, there's nothing that indicates I would have used 23 additional hours from the time I took the screenshot to the time of the quota reset.
This behavior has been happening every day since March 24th for me with no explanation from Google besides "it looks like you exceeded your instances..." I wish I could share the screenshots so you can compare what you're seeing.

How do i find and fix the true cause of page life expectancy issues?

We have been having performance problems on our sql server the last month, and we have a hard time getting it fixed.
This was the page life expectancy of one full week, one year ago:
You see that we have frequent dips in the morning, but thats when our heavy batch processes kick in. During the day everything is peachy.
The same graph, but the current situation:
Big difference, lots of user complaints.
A detail of the day:
We have 64Gb of ram in the server, which should be plenty. Normally when we have performance issues, we start looking at the queries that cause the most total wait time (we have an analysis tool or that), we try to improve/remove/cache these queries and usually this simply works.
In this case, i have been following the same approach, but it does not seem to affect the PLE counters. How can i correctly identity the cause of these problems so i can fix what needs to be fixed?

What amazon ec2 would be suggested for an ecommerce website with windows and sql server?

We are running an eCommerce venture that has around 2000 unique visitors in a day. The total data is around 6 GB as of now.
We are using SQL Server as our database and in the coming months the website may scale up to 10000 users per day.
From this link deciphered that it would be best to use M1 instance but could anyone help really clueless as to what to purchase from these options.
Note: Our budget is around 170 Dollars PM.
EDIT: The number of concurrent users we have had is around 150
I'd try to fit everything in memory. If you can't due to budget, you need to make sure the disk response times are up to par for your expected load. You application can vary widely. One visit to a homepage could generate many queries, or maybe you have application caching set up - so it's hard for anyone to just tell you. You should also get solid numbers on your peak number of concurrent users so you can plan for that. You don't mention your current environment, but you can get some numbers about CPU, Disk MB Read/Writes/s and memory used to help you get the right size.
I'd look at the xlarge m1. That gives you 15GB of memory to play with. You'll be able to cache all the data you need and have some left over for the OS and also have some room to grow. CPU probably won't be your issue, but be sure to check out your current use.
If you have some time to spend on it, I'd try setting up JMeter to do some load testing and see how many concurrent users you can max out with one of the cheaper options.
This topic may be better suited to ServerFault.
I'd suggest you to check at the reserved instances in heavy category where you can get interesting discounts if you plan to run it for a year or similar.
But with that budget you should be thinking about an m1.medium instance, which might be a little tight for your requirements.

How to stop Azure database endless loop session

Last night I might have started an endless loop on my Azure website, copying data in the database. Its about 24 hours ago, and my tables still keep growing. Can I stop or will it burn out at some time? My Azure subscription cpu time is totally maxed out.
I even unlinked my database from the website, but the table keeps growing a couple of rows per second
I suggest you stop it as soon as possible using the portal. There are very few reasons that would cause it to 'burn out' and be recycled. The Azure Fabric Controller has no way of knowing that it is a rogue process, so it will never step in.
I had to change the datatype on a column for loop to run into an error and stop. Messed up. CPU time at 518% :S Good thing Azure is so cheap :)

Resources