why i couldn't see any text in "http://crawlservice.appspot.com/?key=123456&url=http://mydomain.com#!article"? - google-app-engine

Ok, i found this link https://code.google.com/p/gwt-platform/wiki/CrawlerSupport#Using_gwtp-crawler-service that explain how you can make your GWTP app crawlable.
I got some GWTP experience, but i know nothing about AppEngine.
Google said its "crawlservice.appspot.com" can parse any Ajax page. Now I have a page "http://mydomain.com#!article" that has an artice that was pulled from Database. Say that page has the text "this is my article". Now I open this link:
crawlservice.appspot.com/?key=123456&url=http://mydomain.com#!article, then i can see all javascript but I couldn't find the text "this is my article".
Why?
Now let check with a real life example
open this link https://groups.google.com/forum/#!topic/google-web-toolkit/Syi04ArKl4k & you will see the text "If i open that url in IE"
Now you open http://crawlservice.appspot.com/?key=123456&url=https://groups.google.com/forum/#!topic/google-web-toolkit/Syi04ArKl4k you can see all javascript but there is no text "If i open that url in IE",
Why is it?
SO if i use http://crawlservice.appspot.com/?key=123456&url=mydomain#!article then Can google crawler be able to see the text in mydomain#!article?
also why the key=123456, it means everyone can use this service? do we have our own key? does google limit the number of calls to their service?
Could you explain all these things?
Extra Info:
Christopher suggested me to use this example
https://github.com/ArcBees/GWTP-Samples/tree/master/gwtp-samples/gwtp-sample-crawler-service
However, I ran into other problem. My app is a pure GWTP, it doesn't have appengine-web.xml in WEB-INF. I have no idea what is appengine or GAE mean or what is Maven.
DO i need to register AppEngine?
My Appp may have a lot of traffic. Also I am using Godaddy VPS. I don't want to register App Engine since I have to pay for Google for extra traffic.
Everything in my GWTP App is ok right now except Crawler Function.
So if I don't use Google App Engine, then how can i build Crawler Function for GWTP?
I tried to use HTMLUnit for my app, but HTMLUnit doesn't work for GWTP (See details in here Why HTMLUnit always shows the HostPage no matter what url I type in (Crawlable GWT APP)? )

I believe you are not allowed to crawl Google Groups. Probably they are actively trying to prevent this, so you do not see the expected content.

There's a couple points I wish to elaborate on:
The Google Code documentation is no longer maintained. You should look on Github instead: https://github.com/ArcBees/GWTP/wiki/Crawler-Support
You shouldn't use http://crawlservice.appspot.com. This isn't a Google service, it's out of date and we may decide to delete it down the road. This only serves as a public example. You should create your own application on App Engine (https://appengine.google.com/)
There is a sample here (https://github.com/ArcBees/GWTP-Samples/tree/master/gwtp-samples/gwtp-sample-crawler-service) using GWTP's Crawler Service. You can basically copy-paste it. Just make sure you update the <application> tag in appengine-web.xml to the name of your application and use your own service key in CrawlerModule.
Finally, if your client uses GWTP and you followed the documentation, it will work. If you want to try it manually, you must encode the Query Parameters.
For example http://crawlservice.appspot.com/?key=123456&url=http://www.arcbees.com#!service will not work because the hash (everything including and after #) is not sent to the server.
On the other hand http://crawlservice.appspot.com/?key=123456&url=http%3A%2F%2Fwww.arcbees.com%2F%23!service will work.

Related

Possible to access Piwik getVisitorLog through HTTP API?

I'm building some reporting tool. Ideally I want to avoid going through web server logs myself and use (some of) the power of Piwik.
The stuff I get from the visitor log would be a good start, this is at http://example.com/piwik/index.php?module=CoreHome&action=index&idSite=1&period=day&date=yesterday#/module=Live&action=getVisitorLog&idSite=1&period=day&date=yesterday
Unfortunately I can't find a getVisitorLog action in the HTTP API docs at
http://developer.piwik.org/api-reference/reporting-api#Actions (and it's also not an undocumented feature, method=Actions.getVisitorLog gives me
The method 'getVisitorLog' does not exist or is not available in the module '\Piwik\Plugins\Actions\API'.
Is there another way to get to this? Or should I write a plugin for Piwik?
Apparently it is possible through the Live plugin API:
http://developer.piwik.org/api-reference/reporting-api#Live
This works as desired:
http://example.com/piwik/index.php?module=API&method=Live.getLastVisitsDetails&format=JSON&idSite=1&period=day&date=2015-07-21&expanded=1&token_auth=XXXXX&filter_limit=100

How can I tell if I'm including google analytics twice?

I have a web app and I include google analytics. My active users seems to of spiked and I'm incredibly paranoid that I'm somehow double counting my analytics.
Is there any way to see if I'm doing this?
As Nebojsa mentioned, you can inspect source and search for ga.js or analytics.js to see if it's in your application twice.
Look through your source code to see if you have the partial rendering in multiple places (ex. header and footer)
Setup another Google Analytics account and test locally if its double counting your visits. See this post for setting up GA on localhost
Use the Google Analytics Tag Assistant to verify that everything is setup correctly. It will tell you if there are any implementation problems, including multiple tracking codes. It also helps with Adwords, re-marketing and other Google product scripts.
Use the Google Analytics Debugger. This would probably be the most helpful to determine if a single hit is being double counted as it walks you though every single function call the analytics urchin makes.
just open source in the browser and look-up for code of analitics...par example
_gaq.push(['_setAccount', ...

Creating a channel for webRTC video chat

I've been following the HTML5rocks webRTC guide and I have the Javascript set up as described, however the guide is not clear on how to receive a channelToken, roomKey, and User ID. The guide says,
"Note that values used in the JavaScript, such as the room variable and
the token used by openChannel(), are provided by the Google App Engine
app itself: take a look at the index.html template in the repository
to see what values are added."
Unfortunately the link provided is no good and I'm left with very little information regarding the most essential step in this process. The guide isn't clear about whether or not the Google App Engine is a necessary component and I don't see why it should be. I have searched the web in an attempt to find a more useful source, but I was unsuccessful. I also took a look at the webRTC Demo(https://apprtc.appspot[dot]com), that too was no help seeing that the channel information is generated server side. I feel like I should just be able to make a simple http request to some Google server and then run from there. Any information regarding my problem would be much appreciated.
Apologies: the code for this example has been moved to here.
(Been meaning to update the article, but haven't had a chance...)
The apprtc.appspot example uses the Channel API on App Engine for signaling, but there are lots of other ways to do this. Signaling mechanisms are not defined by the WebRTC spec. (Note that signaling, which is accomplished via a signaling service, is the exchange of network and media metadata in order to set up a WebRTC 'call': the actual data is communicated directly between peers.)
We ran a codelab at Google I/O, which describes from start to finish how to build a video chat application that uses Socket.io on Node.js for signaling (it's very simple!) You might want to try that instead.

A website that can test other websites

Is there a way, or is there a web site that can run test cases against another REST server?
On the "tester" website you'd say: do this PUT on this URI, using this authentication mechanism, using this JSON string, with this header, etc. If the result contains XYZ then consider it a success, otherwise consider it a failure...
Other users should be able to add new test cases and edit existing test cases.
I wanted to check before writing a combination of wiki, google app engine, selenium, etc.
Thanks.
Its not what you probably search for, but still probably worth the check: SimpleTestIO
That project's goal is to be easier than Selenium and easy to use. In the future will be probably paid.
Disclaimer: I am not connected with the page, just found it while researching for Selenium alternatives

Subdomain is preventing my search results from rising as it should in page rank

My problem is that I have a site which has requires a dedicated page for every city I choose to support. Early on, I decided to use subdomains rather than a directly after my domain (ie i used la.truxmap.com rather than truxmap.com/la). I realize now that this was a major mistake because Google seems to treat la.truxmap.com as a completely different site as ny.truxmap.com. So for instance, if i search "la food truck map" my site will be near the top, however, if i search "nyc food truck map" im no where in sight because ny.truxmap.com wouldnt be very high in the page rank by itself, and it doesnt have the boost that it ought to be getting from the better known la.truxmap.com
So a mistake I made a year ago is now haunting my page rank. I'd like to know what the most painless way of resolving my dilemma might be. I have received so much press at la.truxmap.com that I can't just kill the site, but could I re-direct all requests at la.truxmap.com to truxmap.com/la and do the same for all cities supported without trashing my current, satisfactory page rank results I'm getting from la.truxmap.com ??
EDIT
I left out some critical information. I am using Google Apps to manage my domain (that is, to add the subdomains) and Google App Engine to host my site. Thus, Google Apps provides a simple mechanism to mask truxmap.appspot.com (the app engine domain) as la.truxmap.com, but I don't see how I can mask it as truxmap.com/la. If I can get this done, then I can just 301 redirect la.truxmap.com to truxmap.com/la as suggested below.
Thanks so much!
You could send a "301 Moved Permanently" redirect to cause the Google crawler to update its references to your site, no?
See this article on 301 redirects and SEO.
You'll need to modify your app as follows:
Add www.truxmap.com as an alias for the app (you can't serve naked domains in App Engine, so just truxmap.com won't work)
Add support to your app for handling URLs of the form www.truxmap.com/something/, routing to the same handlers as the subdomain. You'll need to make sure you've debugged any relative path issues well before continuing.
Modify your app to serve 302 redirects for every url under something.truxmap.com/whatever to www.truxmap.com/something/whatever.

Resources