Removing index entry from Google mini box - episerver

We've taken over the support of an EPiServer (5 R2) website (.NET) that uses a Google Mini (no longer offered) to display search results on their site.
Unlike lots of other search solutions, especially those built for EPiServer, the Google mini box indexes the content of the website by crawling the website's URL's - whereas other solutions hook into EPiServers PagePublished, PageDeleted event and accept data to store in the index. The problem the client has been encountering is that when a Page/Document is deleted from EPiServer, it is not immediately deleted from the index on the Google mini box. Therefore for a period of time the deleted page is still returned in the search results - this poses a problem because at times the client must remove pages/documents for legal reasons and therefore the removal needs to be immediate.
The Google mini box does allow the entry of url's that should be indexed more frequently than others, but even so this will not be immediate.
Can anyone advise whether there are programmatical methods available to work with i.e. a HTTP DELETE request? Or any other solutions you may have found in the past?
Many thanks
dotdev

Related

AppStats for managed VMs

We were running on AppEngine but recently moved over to Managed VMs. For some reason AppStats is no longer available? We just get a 404 not found error when browsing to our appstats URL. Is appstats not supported on Managad VMs? If not, is there a way of isolating poorly performing endpoints within our application?
One way to isolate poorly performing endpoints is to use the advanced filter search in the GCP Logs Viewer. It is a little hard to find at first.
To get there, in your Google Cloud console, navigate to Logging for your project. At the right of the text box for "Filter by label or text search" you will see a small dropdown arrow. Click that and select "Convert to advanced filter". This will allow you to write your own sql-ish query where you can find requests that took longer than n to complete.
For example, add the following to the filter:
protoPayload.latency>"0.300s"
This will return a list of all requests that took longer than 300 milliseconds to process. If you have Cloud Trace enabled, you can click on the request response time to see the timeline for the individual service calls.

Why is my Google App Engine site over quota?

I'm getting "Over Quota
This application is temporarily over its serving quota. Please try again later." on my GAE app. It's not billing-enabled. I ran a security scan against it today, which presumably triggered the over quota, but I can't explain why based on the information in the console.
Note that 1.59G has been used responding to 4578 requests. That's an average of about 347k per request, but none of my responses should ever be that large.
By filtering my logs I can see that there was no request today whose response size was greater than 25k. So although the security scan generated a lot of small requests over its 14 minute run, it couldn't possibly account for 1.59G. Can anyone explain this?
Note: mostly suppositions ...
The Impact of Security Scanner on logs section mentions:
Some traces of the scan will appear in your log files. For instance,
the security scanner generates requests for unlikely strings such as
"~sfi9876" and "/sfi9876" in order to examine your application's error
pages; these intentionally invalid page requests will show up in your
logs.
My interpretation is that some of the scan requests will not appear in the app's logs.
I guess it's not impossible for some of the scanner's requests to similarly not be counted in the app's request stats, which might explain the suspicious computation results you reported. I don't see any mention of this in the docs to validate or invalidate this theory. However...
In the Pricing, costs, and traffic section I see:
Currently, a large scan stops after 100,000 test requests, not
including requests related to site crawling. (Site crawling requests
are not capped.)
A couple of other quotes from Google Cloud Security Scanner doc:
The Google Cloud Security Scanner identifies security vulnerabilities
in your Google App Engine web applications. It crawls your
application, following all links within the scope of your starting
URLs, and attempts to exercise as many user inputs and event handlers
as possible.
Because the scanner populates fields, pushes buttons, clicks links,
and so on, it should be used with caution. The scanner could
potentially activate features that change the state of your data or
system, with undesirable results. For example:
In a blog application that allows public comments, the scanner may post test strings as comments on all your blog articles.
In an email sign-up page, the scanner may generate large numbers of test emails.
These quotes suggest that, depending on your app's structure and functionality, the number of requests can be fairly high. Your app would need to be really basic for the quoted kinds of activities to be achieved in 4578 requests - kinda supporting the above theory that some scanner requests might not be counted in the app's stats.

How can I tell if I'm including google analytics twice?

I have a web app and I include google analytics. My active users seems to of spiked and I'm incredibly paranoid that I'm somehow double counting my analytics.
Is there any way to see if I'm doing this?
As Nebojsa mentioned, you can inspect source and search for ga.js or analytics.js to see if it's in your application twice.
Look through your source code to see if you have the partial rendering in multiple places (ex. header and footer)
Setup another Google Analytics account and test locally if its double counting your visits. See this post for setting up GA on localhost
Use the Google Analytics Tag Assistant to verify that everything is setup correctly. It will tell you if there are any implementation problems, including multiple tracking codes. It also helps with Adwords, re-marketing and other Google product scripts.
Use the Google Analytics Debugger. This would probably be the most helpful to determine if a single hit is being double counted as it walks you though every single function call the analytics urchin makes.
just open source in the browser and look-up for code of analitics...par example
_gaq.push(['_setAccount', ...

Search support for Google App Engine Go runtime

There is search support (experimental) for python and Java, and eventually Go also may supported. Till then, how can I do minimal search on my records?
Through the mailing list, I got an idea about proxying the search request to a python backend. I am still evaluating GAE, and not used backends yet. To setup the search with a python backed, do I have to send all the request (from Go) to data store through this backend? How practical is it, and disadvantages? Any tutorial on this.
thanks.
You could make a RESTful Python app that with a few handlers and your Go app would make urlfetches to the Python app. Then you can run the Python app as either a backend or a frontend (with a different version than your Go app). The first handler would receive a key as input, would fetch that entity from the datastore, and then would store the relevant info in the search index. The second handler would receive a query, do a search against the index, and return the results. You would need a handler for removing documents from the search index and any other operations you want.
Instead of the first handler receiving a key and fetching from the datastore you could also just send it the entity data in the fetch.
You could also use a service like IndexDen for now (especially if you don't have many entities to index):
http://indexden.com/
When making urlfetches keep in mind the quotas currently apply even when requesting URLs from your own app. There are two issues in the tracker requesting to have these quotas removed/increased when communicating with your own apps but there is no guarantee that will happen. See here:
http://code.google.com/p/googleappengine/issues/detail?id=8051
http://code.google.com/p/googleappengine/issues/detail?id=8052
There is full text search coming for the Go runtime very very very soon.

Subdomain is preventing my search results from rising as it should in page rank

My problem is that I have a site which has requires a dedicated page for every city I choose to support. Early on, I decided to use subdomains rather than a directly after my domain (ie i used la.truxmap.com rather than truxmap.com/la). I realize now that this was a major mistake because Google seems to treat la.truxmap.com as a completely different site as ny.truxmap.com. So for instance, if i search "la food truck map" my site will be near the top, however, if i search "nyc food truck map" im no where in sight because ny.truxmap.com wouldnt be very high in the page rank by itself, and it doesnt have the boost that it ought to be getting from the better known la.truxmap.com
So a mistake I made a year ago is now haunting my page rank. I'd like to know what the most painless way of resolving my dilemma might be. I have received so much press at la.truxmap.com that I can't just kill the site, but could I re-direct all requests at la.truxmap.com to truxmap.com/la and do the same for all cities supported without trashing my current, satisfactory page rank results I'm getting from la.truxmap.com ??
EDIT
I left out some critical information. I am using Google Apps to manage my domain (that is, to add the subdomains) and Google App Engine to host my site. Thus, Google Apps provides a simple mechanism to mask truxmap.appspot.com (the app engine domain) as la.truxmap.com, but I don't see how I can mask it as truxmap.com/la. If I can get this done, then I can just 301 redirect la.truxmap.com to truxmap.com/la as suggested below.
Thanks so much!
You could send a "301 Moved Permanently" redirect to cause the Google crawler to update its references to your site, no?
See this article on 301 redirects and SEO.
You'll need to modify your app as follows:
Add www.truxmap.com as an alias for the app (you can't serve naked domains in App Engine, so just truxmap.com won't work)
Add support to your app for handling URLs of the form www.truxmap.com/something/, routing to the same handlers as the subdomain. You'll need to make sure you've debugged any relative path issues well before continuing.
Modify your app to serve 302 redirects for every url under something.truxmap.com/whatever to www.truxmap.com/something/whatever.

Resources