Daily change each page on google app engine - google-app-engine

I put one app on google app engine.
My app has one cronjob which is parse data from Internet and store into my db.
When user using my app, it will extract data from db, and show data to users.
I found that is too time consuming and too many request from db.
I want to revise each page when the cronjob running daily.
Then user can see the page without query my database.
How can I do that in GAE ?
Thank you for your reply.

Not nearly enough info in the question to help you. For example, what does "too many request from db" mean? Is it because you have a lot of traffic? Or you are querying the db too much?
Possible solutions are:
Edge cache your page: https://groups.google.com/forum/#!topic/google-appengine/6xAV2Q5x8AU/discussion
Store your page in memcache.
Optimize your database accesses. Most likely you're doing something very inefficiently.
Use a cronjob to generate your page, store it in the blobstore, and redirect your fetches to the blobstore. You can do this, but this is a pretty dumb way to go about it, given that there are better options.

I'm afraid it's a limitation of GAE, according to this post, no matter how much caching and tricky solutions you find. They just worsen google's policies.
You can't have what you need if you don't write directly inside the html file and store it on the server every time, which is far more resource consuming and, in my opinion, just pointless. Since GAE is a free service, with the purpose of testing, you should acquaint with what you have, or pay.

Related

Avoid DoS attacks on App Engine

I have a small question with possibly a complex answer. I have tried to research around, but I think I may not know the keywords.
I want to build a web service that will send a JSON response, which would be used for another application. My goal is having the App Engine server crawl a set of webpages and store the relevant values so the second application (client) would not need to query everything. It will only go to my server with the already condensed information.
I know, it's pretty common, but how can I defend from attackers who wish to exhaust my App Engine resources/quota?
I have been thinking on limiting the amount of requests by IP (say.. 200 requests / 5 minutes), but is that feasible? Or is there a better, and more clever way of doing it?
First, you need to cache the JSON. don't hit the datastore for every request. use memcache or possibly, depending on your requirements, you can cache the JSON in a static file in Cloud Storage. This simple is the best defender against DDOS, since every request adds minimal overhead.
Also, take a look in the DDOS protection service offered by app engine:
https://developers.google.com/appengine/docs/java/config/dos
You could require users to log-in then generate and send an auth key to the client app that must accompany any requests to the app engine service.

What is the suitable db for bulk writes

My application is currently on app engine server. My application writes the records(for logging and reporting) continuously.
Scenario: Views count in the website. When we open the website it hits the server to add the record with time and type of view. Showing these counts in the users dashboard.
Seems these requests are huge now. For now 40/sec. Google App Engine writes are going heavy and cost is increasing like anything.
Is there any way to reduce this or any other db to log the views?
Google App Engine's Datastore is NOT suitable for such a requirement where you have to continuously write to datastore and read less often.
You need to offload this task to a third party service (either you write one or use existing one)
Better option for user tracking and analytics is Google Analytics (Although you wont be directly able to show the hit counters on website using analytics).
If you want to show your user page hit count use a page hit counter: https://www.google.com/search?q=hit+counter
In this case you should avoid Datastore.
For this kind of analytics it's best to do the following:
Dump data to GAE log (yes, this sounds counter-intuitive, but it's actually advice from google engineers). GAE log is persistent and is guaranteed to not loose data you write to it.
Periodically parse the log for your data and then export it to BigQuery.
BigQuery has a quite powerful query language so it's capable of doing complex analytics reports.
Luckily this was already done before: see the Mache framework. Also see related video.
Note: there is now a new BigQuery feature called streaming inserts, which could potentially replace the cumbersome middle step (files on Cloud Storage) used in Mache.

In Python and GAE: How to permanently cache data from a Datastore across HTTP GET requests

I am developing an online product using GAE and Python. Certain data in my Model (i.e. Datastore) are constant across Contexts: which means for all incoming HTTP GET requests, those data don't change.
For the sake of argument, assume that said data must live in the Datastore as opposed to static pages (e.g html).
How would I set the Google App Engine Caching policies so that the Datastore is only queried once in the life of the application -- even if the product is experiencing millions of hits per day?
DISCLAIMER: I am a complete newbie to both Python and GAE.
I am presently looking into global variables, which I would use to store said query results. Not only do I not yet know how that would work, there is another problem: Different HTTP GET requests (i.e. urls) are for different portions and views of said constant data.
Thanks for any insight.
You may want to take a look at the Memcache API. It will allow you to do basically what you want - cache the results of a query (or even the resulting page as HTML) and serve it while it is available (you can set an expiry, but you will also occasionally experience cache misses where the datastore is queried anyway). Also, as #voscausa mentions, switching your datastore API from db to ndb will provide automatic caching with additional options to further modify caching behavior (docs here).

Google app engine for websites which updates every second?

I need advice if it is worthwhile to explore Google APP Engine option, so if learned and experienced user could comment, it would really help (I do not need code)
Present Scenario:
I have a website, where the data need to be updated every second ? it is built on .NET, and a user need to have updated data every time they visit, the data changes every second. The users have bookmarked the URLs so the data is changed and URL remains the same.
We also have a lot static data, which users access for researching and reading.
Experience with cloud:
We had tried using the Website with one of the Big Players (not with the original cloud company, with their nearest competitor ;) we had problems the file getting stuck at times (essentially some users are seeing update, some not), and they had 'Modified Trust' rights level implemented, which was restricting us at multiple places (Auto Generating files in directory)
My Questions:
(a) You think in above scenario, Google App Engine could help ?
(b) URL re-writing more specifically generating 200 server return instead of 404 would that be possible or the 404 being trapped and coverted into 302 and redirected ?
(c) We had a hole in the pocket on hosting fees when we moved from traditional to cloud and now we are back on traditional server with Load Balancer, do you think on heavy traffic site do we stick with traditional or look at google app to lower our costs ?
I look forward in hearing comments..
Thanking everyone in advance.
(a) You think in above scenario, Google App Engine could help ?
The problem with users not seeing data is a factor of caching or eventual consistency in your database. That's not going to be "solved" by moving to a new cloud provider. The appengine datastore uses eventual consistency, but you can solve that problem by using memcache to store data that changes frequently. That said, Appengine doesn't give you complete control over memcache so you may still have problems solving that issue.
(b) URL re-writing more specifically generating 200 server return instead of 404 would that be possible or the 404 being trapped and coverted into 302 and redirected ?
Not really sure what you mean here. You can certainly return 302 or 200 responses instead of 404s using any web framework worth its salt
(c) When designed well, appengine can be very cost effective, but when not optimized it can be a money sink... there are a lot of good papers out there about how to effectively optimize it, but if you are talking about a lot of users hitting the site every second you are going to pay for it.

Web Analytics & Stats

We want to add tracking statistics to a web application we are building but are pretty unsure of how to go about it. (i.e. clicks, pageviews, unique visits etc)
Does anyone have any articles on the best way to go about incorporating tracking data into an application ? i.e. javascript tracking or IIS etc ?
We want to add tracking in as a ASP.NET MVC module - but we are unsure as to the best way to actually get the data and essentially 'track' this information ?
If anyone could help out - much appreciated.
Edit: just to be clear, we want to do this in-house and present the stats to our users as an additional fee module?
You can turn on the logging for IIS and then use the SQL Server Report Server Pack for IIS. It comes with many canned reports for your sites stats and then you could take it from there with your own custom reports.
You could also just use log parser to get the stats into a SQL Server DB and then you could use SQL from their to analyse and roll your own app.
Either way, you could modularize this and sell it as an add-on to your customer base.
You could use Piwik, you just need PHP version 5.1.3 or greater and MySQL version 4.1 or greater. As they say in their website, "Piwik aims to be an open source alternative to Google Analytics."
They have a demo on the official website so you can see if it's what you're looking for.
Google analytics is a popular service. You just insert a bit of javascript on every page that contains your sites name and Google tracks the data and provides all the report on a handy web based dashboard.
It's not an ASP.net MVC module like what you mentioned, but it will certain track stats for you and will be a lot simpler to set up than trying to code or integrate anything yourselves.
I'd look at analytics to begin with and only branch out to something more complex if it doesn't meet your requirements.
klabranche provided a holistic answer in terms of using logs of web server. I think using web server log is a a great way to analyse data of your web application.
That being said, depend on your web application and the scope of your analytics, just relay on web server log is not a good way to.
As you may know, web log does not record users behaviors like clicking certain tabs which may not trigger a web server request. Obviously your web log has no idea whether users clicked that tab or not, this may hurt your analyse.
Another you need to know is browser cache, this may create another black hole in your data.
RECAP
If you want to do a holistic analytics, you need to use two approaches, one is JavaScrip tag, another one is web log. Since both of them have shortages, combining them together will give you a complete picture.
Hope this helps

Resources