Determining origin of traffic from Referrer Header characteristics - analytics

I'm writing a web application that will track incoming traffic to a website and track the origin of the traffic and its behaviour on our site, so that we can get some idea of the return on investment of our marketing campaigns, the actual keywords and their value to us (rather than to google) and the lost traffic, and our lost spend.
Part of this involves looking at the referrer information from the browser on the first page visited. Referrers like Google Organic and Google Paid Search are easy to identify using regex matching to look for particular strings within the referrer (I'm using php's $_SERVER). The same is true for Bing, Ask, Yahoo, LinkedIn and Facebook.
But, I'm having a problem with one particular source - Google Content Network. Sometimes traffic coming from these ads has a nice link that begins http://googleads.g.doubleclick.net/pagead/ads?
which is obviously easy to code for. On the other hand, the traffic from sites showing our ads sometimes comes with the Referrer of the site itself as though it were a hard coded link. This second hard coded type link is causing problems as we can't differentiate it from regular referred traffic.
So, other than tagging the urls our ads are pointing to with something like '?source=gcn', or scraping the referring page to look for a hard coded link or a google ads iframe, has anyone got any magic sauce to overcome this issue?
Thanks in advance
Ross

So, it seems I've been looking in completely the wrong place for a solution to this.
In a nutshell, the problem is that I need to access Google PPC information regarding visitors to my site but google doesn't always pass this information along in the referrer and certainly is problematic where Display Network appears on a page using javascript to insert it directly into the dom.
Where should I have been looking? Google Analytics. The __utmz cookie contains a wealth of information regarding the route that traffic got the site... including whether they came via PPC / Organic or Display Network and the search terms (where applicable) that got them there.
See the following page for more information:
http://code.google.com/apis/analytics/docs/concepts/gaConceptsCookies.html
Who'd have thought! Anyway, there is some great documentation on what the cookies do and how they are constructed. Problem Solved.
Ross

Related

How to solve: This URL is already in use by another Google service (Sites Google)?

I have just verified the custom URL for my Sites Google. When I want to assign it, it says, "This URL is already in use by another Google service." Meanwhile, I don't remember using the URL for any Google service. I just verified it with Google Webmasters. Anyway, I use Plesk for my domain services. Any help?
This is my site: https://sites.google.com/view/alvisyhrn/home
This is my URL: www.alvisyahrin.com
Your help will be much appreciated.
Thank you.
I use Google Domains but was running into the same error message. This post suggests creating and then deleting a synthetic redirect record (e.g. www.alvisyahrin.com -> http://google.com) in Google Domains. This displayed a "All resource records in this synthetic record will be deleted." message before deleting, and seems to have done the trick, since as soon as I deleted the synthetic record Sites was willing to use it as a custom domain.
I realize you're using a different registrar for your domain, but visiting your site now it looks like you managed to get things working (I assume by doing something like this). Hopefully this will be a helpful breadcrumb for Google Domains users that run into this, at least.

"This page can't load Google Maps correctly."

I have an active Google Cloud Platform account for the purpose of integrating the map functions into my website. Currently I have a map on my contact page (https://voltfuse.com/contact) and on my dealers page (https://voltfuse.com/dealers).
Until recently, I noticed that the map on my dealer page is no longer working, telling me that it was unable to correctly load. You can see an image of this in the attached "Broken Map.png".
It's strange because the map is working perfectly with the same API on my contact page, which can be seen on "Working Map.png".
The code for the broken page can be found here: https://codeshare.io/5w0ez7
I am wondering if anyone has any ideas as to why the map is not loading.
Thanks,
Alex
Broken Map.png Working Map.png
Google has recently imposed strict limits on the Google Maps JavaScript API. You must now enter your billing information in the Console to enable higher limits for your account. As of the date of this answer, after entering your billing information, you will receive $200 worth of free usage per month.
See more info at https://developers.google.com/maps/documentation/javascript/usage-and-billing

How to unpublish an iCal (*.ics) feed?

One feature on my site allows registered users to create calendars for their organization. We provide a dynamically-generated iCal feed for these calendars through a URL with query-string parameters. Anyone can subscribe to these feeds by entering the provided URL into Google Calendar, Outlook, iPhone, etc...
This has been working well enough for a few years, but we now have a problem with stale or deleted calendars. If a registered user significantly alters or deletes their account, the calendar will no longer exist and the feed is useless. We currently return a "404 - Not Found" error for those requests (recently changed from "400 - Bad Request").
My question is, other than returning the 404, is there any way to get subscribers to stop requesting a bad feed? This is a similar question, where the accepted answer suggests returning 404 or 410 and hoping the clients will see the error and manually remove the subscription.
That doesn't seem to be working so far. We get ~ 100k feed requests an hour and a full 30% of those are for deleted calendars.
Do Google, Apple, et al not give up when they repeatedly get a 404 for a feed? How have others handled this issue?
If this was just a problem with log pollution I wouldn't worry too much about it. However, since the feeds are dynamically generated, each request hits the backend db. The processing is trivial and doesn't appear to be affecting performance, but the situation can only get worse.
Apologies if this belongs on ServerFault. While the issue affects my servers, I believe the solution is programmatic.
I don't believe there is an easy answer - I think it's been asked before.
It's like having to deal with all the traffic when some hackers use your site for target practise on logins or xmlrpc or just looking for vulnerabilities. Or the spammers trying a scatter gun approach sending emails. Or when a web spider decides to excessively crawl your site. You have to size for all that non useful traffic.
You could possibly generate and keep up to date a list outside of the database of the bad ics URLs and have a script check and bounce the request before it gets near the database ?
Basically try to deal as efficiently as possible with the problem.
You could also in account deletion try adding a step that requests the user go to their calendar programs and delete the feed before continuing. However that might cause bad vibes and probably would not totally fix it anyway.

DoubleClick/YieldManager/Ad.com Tracking Pixels - What exactly do they track?

Our marketing team has placed a lot of these tracking pixels on our site. Most of them just make a simple HTTP GET to a URL, usually by using a IMG tag, but some document.write in an iframe/script node as well.
What I would like to know, is what exactly these track. Source IP? What if you are behind a proxy?
These sites cause the visitors browser to go to the site to load the image or javascript. What the site does is store a cookie and/or fingerprint of the visitor. Your site also tells the tracking site something about the visitor -- whether they purchased something or particular aspects of the pages that were vistited. The tracking site than can connect this visit with other visits to your sites, other sites, banner ads or more.
It's called a Web bug.

How best to screen scrape a password protected site on behalf of a 3rd party?

I want to write a program that analyzes your fantasy baseball team and notifies you of recommended actions, possibly multiple times per day. The problem is, you aren't playing fantasy baseball on my site, you're playing on yahoo, or cbs, or espn, etc.
On the majority of these sites, fantasy teams and leagues are not public, so you must be logged in and a member of the league to see the teams in the league.
All that I need is the plain html for the team page on each of those sites to be sent to my server, where I can then parse and analyze the file and send user notifications.
The problem is that I need username/password combinations to easily get this data to my server when I need it, and I think there will be a lot of people who wouldn't want to entrust their yahoo/espn/cbs password to me.
I have come up with several possible ways to solve this problem:
The most obvious way is to ask for their credentials for the site on which their team is hosted. Then I could just programmatically log in and request the data I need. I'm guessing a number of people would be comfortable giving me their credentials, and a number of them not so much.
Write a desktop client, which the user then downloads. The client would require their credentials, but it could then basically do exactly the same thing that the server based version would do, log in, request the page, and send the page back to my server. The difference being that their password would never need to leave their desktop. Their computer would need to be on, and this program running for this method to work.
Write browser add-ons that navigate to the page I need, use the cookie that is saved from a previous login to login to the site, and send the page back to my server. This doesn't require my software to ever ask for their password, but if the cookie expires I am hosed, and I don't know much about browser add-ons besides.
I'm sure there are other options, but these are what I've come up with so far.
I have two questions:
1. What are the other possibilities for this type of task?
2. Am I over-estimating people's reluctance to give me their yahoo (for example) password? Is option (1) above the obvious choice?
It was suggested in the comments that I try yahoo pipes, and that looked like a promising suggestion so I explored it a bit. Having looked now at this, I don't think that is an option. So, it looks like I'll be going with option 1.
This is a problem I grappled with a couple of years ago when I wanted to do the same thing. Our site is http://benchcoach.com and the options we were considering were the following:
Original we considered getting the user's credentials and login. We would then log in and scrape their league and team info. The problem there is that after reading several of the various terms of service, this would definitely be violating the terms of service. On top of this, Yahoo! was definitely one of the sites we were considering and their users have email (where we could get access to sensitive data), and Yahoo! wallet. In addition, it would be pretty trivial for Yahoo/ESPN/CBS to block our programmatic logins by IP Address.
The solution we settled on (not 100% happy but it does seem to work) was asking our users to install a bookmarklet (like delicious, digg or reddit) which would post the current html page to our servers, where we could parse the data and load our database. If they were still logged into their Yahoo/ESPN/CBS account, we would direct them directly to the pages, otherwise, those sites would prompt for authentication. Clicking the bookmarklet once more, would post the page to our servers.
The pros of this approach was that we never collected anyone's credentials so any concern of security would have been alleviated. Secondly, it would make it impossible for Yahoo/ESPN/CBS to block access to our service since we would never be connecting directly to their servers but rather the user's browser would be posting the contents of their browser to our server.
The problems with this is that it takes 2 clicks to post a page to our site. For head to head leagues, we needed 3-4 pages so it would take our user 6-8 clicks to sync their league to our servers. We're still looking at options for this.
One important note is that I ran into the product manager of the Yahoo Fantasy Football site at a conference a year ago. We talked about how we were getting the Yahoo data, and he confirmed that getting credentials would violate their TOS and they may stop us. While I don't think they would have, it would have made it hard to invest time and energy to develop this only to have them block our site and pissing of users by closing their accounts.
A potentially more complicated answer could possibly be done with (for example) yahoo pipes.
Hypothetically, you create a pipe which prompts the user for their credentials and provides them with a url which contains their scraped data. They enter this URL in their site, and never have to provide their credentials directly. Even better, for the security-conscious, it would be possible to examine what the pipe was actually doing before entering any information.
The downside would be increased complexity (as well as you'd have to write and maintain the pipe). Having said that, you could provide a link directly to the published pipe from your site, to make things as easy as possible.
Option 1 is the obvious choice. People who trust your site will provide the details. There is no other way you can login to other site while screen scraping.

Resources