Comparing Google Analytics and Data from our Hosting Provider - analytics

Our hosting provider uses an app called AWStats for to give us data about visits/visitors etc. However when I compare that to my google analytics data the numbers are far off. For example AWStats says we had about 5000 visitors but GA says about 1500. How can I uncover the source of the disparity?

A few ideas:
Improper implementation for one or both analytics services
Different definition of what a visitor is for both analytics services. For example, does GA say that you have 1500 unique visitors? or 1500 visits?
Have you tried using a program like HTTPFox to look at the requests getting sent? Are there duplicate requests?

Related

Google Monitoring API : Get Values

I'm trying to use the Google Monitoring API to retrieve metrics about my cloud usage. I'm using the Google Client Library for Python.
The API advertises the ability to access over 900 Stackdriver Monitoring Metrics. I am interested in accessing some Google App Engine metrics, such as Instance count, total memory, etc. The Google API Metrics page has a list of all the metrics I should be able to access.
I've followed the guides on the Google Client Library page , but my script making the API calls is not printing the metrics, it is just printing the metric descriptions.
How do I use the Google Monitoring API to access the metrics, rather than the descriptions?
My Code:
from oauth2client.service_account import ServiceAccountCredentials
from apiclient.discovery import build
...
response = monitor.projects().metricDescriptors().get(name='projects/{my-project-name}/metricDescriptors/appengine.googleapis.com/system/instance_count').execute()
print(json.dumps(response, sort_keys=True, indent=4))
My Output
I expect to see the actual instance count. How can I achieve this?
For anyone reading this, I figured out the problem. I was assuming the values would come from the 'metric descriptors' class in the api, but that was a poor assumption.
For values, you need to use a 'timeSeries' call. For this call, you need to specify the project you want to monitor, start time, end time, and a filter (the metric you want, such as cpu, memory, etc.)
So, to retrieve the app engine project memory, the above code becomes
request = monitor.projects().timeSeries().list(name='projects/my-appengine-project',
interval_startTime='2016-05-02T15:01:23.045123456Z',
interval_endTime='2016-06-02T15:01:23.045123456Z',
filter='metric.type="appengine.googleapis.com/system/memory/usage"')
response = request.execute()
This example has the start time and end time to cover a month of data.

How can I tell if I'm including google analytics twice?

I have a web app and I include google analytics. My active users seems to of spiked and I'm incredibly paranoid that I'm somehow double counting my analytics.
Is there any way to see if I'm doing this?
As Nebojsa mentioned, you can inspect source and search for ga.js or analytics.js to see if it's in your application twice.
Look through your source code to see if you have the partial rendering in multiple places (ex. header and footer)
Setup another Google Analytics account and test locally if its double counting your visits. See this post for setting up GA on localhost
Use the Google Analytics Tag Assistant to verify that everything is setup correctly. It will tell you if there are any implementation problems, including multiple tracking codes. It also helps with Adwords, re-marketing and other Google product scripts.
Use the Google Analytics Debugger. This would probably be the most helpful to determine if a single hit is being double counted as it walks you though every single function call the analytics urchin makes.
just open source in the browser and look-up for code of analitics...par example
_gaq.push(['_setAccount', ...

"Over quota" when using GCS json-api from App Engine

I am using Go on App Engine. In most cases, I use the file api to access GCS, which works great, except that deletes don't work so to delete files I use the JSON-API (specifically, the google-go-api-client). To authenticate, I use app engine service accounts. We are sometimes seeing an error come back of "Over quota:" with nothing after the colon. Since we are a paid app, what quota could this be? Is there a burst limit (e.g. no more than X requests in a single minute)? Is there any places where any such applicable quotas are documented?
The caching mechanism is broken for goauth2 and serviceaccount tokens. You can see the issue I created here for more detail: https://code.google.com/p/goauth2/issues/detail?id=28
I came across a "over quota" issue myself when requesting more than 60 service accounts a minute. I opened a ticket with AppEngine support (I pay for the silver package) and got this undocumented information out of them.
You can apply the patch yourself in your $GOPATH/src/code.google.com/p/goauth2/appengine/serviceaccount/cache.go file. This fixed the issue you described for my team.
Even i had found same problem and found two reasons:-
1.Daily budget
2.Logs retention
Solution:
for problem 1 increase the daily budget
for problem 2 increase the retention from 1 to higher GB
![enter image description here][1]

Universal Google Analytics on/off line syncing

we have a angularjs app and we want to track the android users way of using it in the fields. This app can be used for 2-3 days off-line and sync opportunistically when wi-fi connections or 3g-4g are available.
Our goal is to track user behaviour with UA and cache usage data to send at first wifi opportunity.
Does UA have a built-in support for this?
if not
Can we do this programatically using v3 apis? ie. cache data locally and use a service to sync the data if online?
In either case, how much data can UA cache?
How many days can the data stay cached locally before syncing?
Cheers,
T
While UA has support for this the support does not extend to multiple days - according to the parameter reference for the measurement protocol (which is the basis for UA) "queue time" is four hours at max.
Using the API (which is built on top of the measurement protocol) will not change that.

Options for Filtering Data in real time - Will a rule engine based approach work?

I'm looking for options/alternative to achieve the following.
I want to connect to several data sources (e.g., Google Places, Flickr, Twitter ...) using their APIs. Once I get some data back I want to apply my "user-defined dynamic filters" (defined at runtime) on the fetched data.
Example Filters
Show me only restaurants that have a ratting more than 4 AND have more than 100 ratings.
Show all tweets that are X miles from location A and Y miles from location B
Is it possible to use a rule engine (esp. Drools) to do such filtering ? Does it make sense ?
My proposed architecture is mobile devices connecting to my own server and this server then dispatching requests to the external world and doing all the heavy work (mainly filtering) of data based on user preferences.
Any suggestions/pointers/alternatives would be appreciated.
Thanks.
Yes, Drools Fusion allows you to easily deal with this kind of scenario. Here is a very simple example application that plays around with twitter messages using the twitter4j API:
https://github.com/droolsjbpm/droolsjbpm-contributed-experiments/tree/master/twittercbr
Please note that there is an online and an offline version in that example. To run the online version you need to get access tokens on the twitter home page and configure them in the configuration file:
https://github.com/droolsjbpm/droolsjbpm-contributed-experiments/blob/master/twittercbr/src/main/resources/twitter4j.properties
check the twitter4j documentation for details.

Resources