How to unpublish an iCal (*.ics) feed? - calendar

One feature on my site allows registered users to create calendars for their organization. We provide a dynamically-generated iCal feed for these calendars through a URL with query-string parameters. Anyone can subscribe to these feeds by entering the provided URL into Google Calendar, Outlook, iPhone, etc...
This has been working well enough for a few years, but we now have a problem with stale or deleted calendars. If a registered user significantly alters or deletes their account, the calendar will no longer exist and the feed is useless. We currently return a "404 - Not Found" error for those requests (recently changed from "400 - Bad Request").
My question is, other than returning the 404, is there any way to get subscribers to stop requesting a bad feed? This is a similar question, where the accepted answer suggests returning 404 or 410 and hoping the clients will see the error and manually remove the subscription.
That doesn't seem to be working so far. We get ~ 100k feed requests an hour and a full 30% of those are for deleted calendars.
Do Google, Apple, et al not give up when they repeatedly get a 404 for a feed? How have others handled this issue?
If this was just a problem with log pollution I wouldn't worry too much about it. However, since the feeds are dynamically generated, each request hits the backend db. The processing is trivial and doesn't appear to be affecting performance, but the situation can only get worse.
Apologies if this belongs on ServerFault. While the issue affects my servers, I believe the solution is programmatic.

I don't believe there is an easy answer - I think it's been asked before.
It's like having to deal with all the traffic when some hackers use your site for target practise on logins or xmlrpc or just looking for vulnerabilities. Or the spammers trying a scatter gun approach sending emails. Or when a web spider decides to excessively crawl your site. You have to size for all that non useful traffic.
You could possibly generate and keep up to date a list outside of the database of the bad ics URLs and have a script check and bounce the request before it gets near the database ?
Basically try to deal as efficiently as possible with the problem.
You could also in account deletion try adding a step that requests the user go to their calendar programs and delete the feed before continuing. However that might cause bad vibes and probably would not totally fix it anyway.

Related

User login in Django + React

I have looked through quite a few tutorials (e.g. this, this, and this) on user authentication in a full-stack Django + React web app. All of them simply send username and password received from the user to the backend using a POST request. It seems to me that, if the user leaves the computer unattended for a minute, anyone can grab his password from the request headers in network tools in the browser. Is this a valid concern that must be taken care of? If so, how should these examples be modified? A tutorial / example of the correct approach would be appreciated.
It seems to me that, if the user leaves the computer unattended for a minute, anyone can grab his password from the request headers in network tools in the browser
If the user leaves the computer unattended then what you are describing will probably be the least of his/her worries.
Authentication is a complex topic, if you really do not want to use existing libraries that handle this for you then you will need to spend quite some time to get things right (knowing that even then, risk 0 does not exist), the most basic thing being to never store plain text credentials on your DB and using https to transmit them over an encrypted connection. You can then start thinking about JWTs, avoiding local storage, CSRF and securing cookies, refresh tokens, etc.
You cannot do much however about cases like the one you describe of people giving away access to their computers or sharing their passwords with others except reminding them they should never do such a thing.
On a side note, if the user didn't have the network monitoring tool open when making the request to your website, opening it afterwards will not show the previously submitted plain text credentials (there are workarounds to this however)

How much AppEngine Instance hours should I expected?

I have just developed a mobile apps which basically for users to upload, download photoes, add, update, search , delete, refresh transaction, and query report. Every action need submit request to Appengine Server.
I am using CloudEndpoint, oAuth2.0 and Objectify to implement this appengine. When I'm testing alone, The instance hours has used up 40% . How much billing for instance can I imagine if 100 people using this app? How does it calculate the instance hours? by request of submitting? or by time of instance working on multiple request??
is it worth?
If my target is more than 100 users to using my apps. Is it worth? Could you please share me what exactly I misunderstood about this instance.
Thanks
As others have commented, the question is very hard to answer. The easiest answer I can think of is by looking at the response header "X-AppEngine-Estimated-CPM-US-Dollars". You have to be a member of the Cloud Platform Project (see the Permissions page in Cloud Platform developers console) to see this header (you can check it in your browser).
The header tells you what the cost of the request was in US Dollars multiplied by 1000.
But think of it as an indication. If your request spawns other processes such as tasks, those costs are not included in the number you see in that header.
The relationship between Frontend instance hours and the number of requests is not linear either. For one, you will be charged a number of minutes (not sure if it's 15 minutes) when the instance spins up. And there are other performance settings that determine how this works.
Your best bet is to run the app for a while against real users and find out what the costs were in a given month or so.

Determining origin of traffic from Referrer Header characteristics

I'm writing a web application that will track incoming traffic to a website and track the origin of the traffic and its behaviour on our site, so that we can get some idea of the return on investment of our marketing campaigns, the actual keywords and their value to us (rather than to google) and the lost traffic, and our lost spend.
Part of this involves looking at the referrer information from the browser on the first page visited. Referrers like Google Organic and Google Paid Search are easy to identify using regex matching to look for particular strings within the referrer (I'm using php's $_SERVER). The same is true for Bing, Ask, Yahoo, LinkedIn and Facebook.
But, I'm having a problem with one particular source - Google Content Network. Sometimes traffic coming from these ads has a nice link that begins http://googleads.g.doubleclick.net/pagead/ads?
which is obviously easy to code for. On the other hand, the traffic from sites showing our ads sometimes comes with the Referrer of the site itself as though it were a hard coded link. This second hard coded type link is causing problems as we can't differentiate it from regular referred traffic.
So, other than tagging the urls our ads are pointing to with something like '?source=gcn', or scraping the referring page to look for a hard coded link or a google ads iframe, has anyone got any magic sauce to overcome this issue?
Thanks in advance
Ross
So, it seems I've been looking in completely the wrong place for a solution to this.
In a nutshell, the problem is that I need to access Google PPC information regarding visitors to my site but google doesn't always pass this information along in the referrer and certainly is problematic where Display Network appears on a page using javascript to insert it directly into the dom.
Where should I have been looking? Google Analytics. The __utmz cookie contains a wealth of information regarding the route that traffic got the site... including whether they came via PPC / Organic or Display Network and the search terms (where applicable) that got them there.
See the following page for more information:
http://code.google.com/apis/analytics/docs/concepts/gaConceptsCookies.html
Who'd have thought! Anyway, there is some great documentation on what the cookies do and how they are constructed. Problem Solved.
Ross

Pulling facebook and twitter status updates into a SQL database via Coldfusion Page

I'd like to set up a coldfusion page that will pull the status updates from my own facebook account and twitter accounts and put them in a SQL database along with their timestamps. Whenever I run this page it should only grab information after the most recent time stamp it already has within the database.
I'm hoping this won't be too bad because all I'm interested in is just status updates and their time stamps. Eventually I'd like to pull other things like images and such, but for a first test just status updates is fine. Does anyone have sample code and/or pointers that could assist me in this endeavor?
I'd like it if any information relates to the current version of the apis (twitter with oAuth and facebook open graph) if they are necessary. Some solutions I've seen involve the creation of a twitter application and facebook application to interact with the APIs; is that necessary if all I want to do is access a subset of my own account information? Thanks in advance!
I would read the max(insertDate) from the database and if the API allows you, only request updates since that date. Then insert those updates. The next time you run you'll just need to get the max() of the last bunch of updates before calling for the next bunch.
You could run it every 5 minutes using a ColdFusion scheduled task.
How you communicate with the API is usually using <cfhttp />. One thing I always do is log every request and response, either in a text file, or in a database. That's can be invaluable when troubleshooting.
Hope that helps.
Use the cffeed tag to pull RSS feeds from Twitter and Facebook. Retain the date of the last feed scan somewhere (application variable or database) and loop over the feed entries. Any entry older than last scan is ignored, everything else gets committed. Make sure to wrap cffeed in a try/catch, as it will throw errors if the service is down (ahem, twitter) As mentioned in other answers, set it up as a scheduled task.
<cffeed action="read" properties="feedMetadata" query="feedQuery"
source="http://search.twitter.com/search.atom?q=+from:mytwitteraccount" />
Different approach than what you're suggesting, but it worked for us. We had two live events, where we asked people to post to a bespoke Facebook fan page, or to Twitter with a hashtag we endorsed for the event in realtime. Then we just fetched and parsed the RSS feeds of the FB page, and the Twitter search results, extracting what was new, on a short interval... I think it was approximately every three minutes. CFFEED was a little error-prone and wonky, just doing a CFHTTP get of the RSS feeds, and then processing the CFHTTP.filecontent struct item as XML worked fine
.LAG

How best to screen scrape a password protected site on behalf of a 3rd party?

I want to write a program that analyzes your fantasy baseball team and notifies you of recommended actions, possibly multiple times per day. The problem is, you aren't playing fantasy baseball on my site, you're playing on yahoo, or cbs, or espn, etc.
On the majority of these sites, fantasy teams and leagues are not public, so you must be logged in and a member of the league to see the teams in the league.
All that I need is the plain html for the team page on each of those sites to be sent to my server, where I can then parse and analyze the file and send user notifications.
The problem is that I need username/password combinations to easily get this data to my server when I need it, and I think there will be a lot of people who wouldn't want to entrust their yahoo/espn/cbs password to me.
I have come up with several possible ways to solve this problem:
The most obvious way is to ask for their credentials for the site on which their team is hosted. Then I could just programmatically log in and request the data I need. I'm guessing a number of people would be comfortable giving me their credentials, and a number of them not so much.
Write a desktop client, which the user then downloads. The client would require their credentials, but it could then basically do exactly the same thing that the server based version would do, log in, request the page, and send the page back to my server. The difference being that their password would never need to leave their desktop. Their computer would need to be on, and this program running for this method to work.
Write browser add-ons that navigate to the page I need, use the cookie that is saved from a previous login to login to the site, and send the page back to my server. This doesn't require my software to ever ask for their password, but if the cookie expires I am hosed, and I don't know much about browser add-ons besides.
I'm sure there are other options, but these are what I've come up with so far.
I have two questions:
1. What are the other possibilities for this type of task?
2. Am I over-estimating people's reluctance to give me their yahoo (for example) password? Is option (1) above the obvious choice?
It was suggested in the comments that I try yahoo pipes, and that looked like a promising suggestion so I explored it a bit. Having looked now at this, I don't think that is an option. So, it looks like I'll be going with option 1.
This is a problem I grappled with a couple of years ago when I wanted to do the same thing. Our site is http://benchcoach.com and the options we were considering were the following:
Original we considered getting the user's credentials and login. We would then log in and scrape their league and team info. The problem there is that after reading several of the various terms of service, this would definitely be violating the terms of service. On top of this, Yahoo! was definitely one of the sites we were considering and their users have email (where we could get access to sensitive data), and Yahoo! wallet. In addition, it would be pretty trivial for Yahoo/ESPN/CBS to block our programmatic logins by IP Address.
The solution we settled on (not 100% happy but it does seem to work) was asking our users to install a bookmarklet (like delicious, digg or reddit) which would post the current html page to our servers, where we could parse the data and load our database. If they were still logged into their Yahoo/ESPN/CBS account, we would direct them directly to the pages, otherwise, those sites would prompt for authentication. Clicking the bookmarklet once more, would post the page to our servers.
The pros of this approach was that we never collected anyone's credentials so any concern of security would have been alleviated. Secondly, it would make it impossible for Yahoo/ESPN/CBS to block access to our service since we would never be connecting directly to their servers but rather the user's browser would be posting the contents of their browser to our server.
The problems with this is that it takes 2 clicks to post a page to our site. For head to head leagues, we needed 3-4 pages so it would take our user 6-8 clicks to sync their league to our servers. We're still looking at options for this.
One important note is that I ran into the product manager of the Yahoo Fantasy Football site at a conference a year ago. We talked about how we were getting the Yahoo data, and he confirmed that getting credentials would violate their TOS and they may stop us. While I don't think they would have, it would have made it hard to invest time and energy to develop this only to have them block our site and pissing of users by closing their accounts.
A potentially more complicated answer could possibly be done with (for example) yahoo pipes.
Hypothetically, you create a pipe which prompts the user for their credentials and provides them with a url which contains their scraped data. They enter this URL in their site, and never have to provide their credentials directly. Even better, for the security-conscious, it would be possible to examine what the pipe was actually doing before entering any information.
The downside would be increased complexity (as well as you'd have to write and maintain the pipe). Having said that, you could provide a link directly to the published pipe from your site, to make things as easy as possible.
Option 1 is the obvious choice. People who trust your site will provide the details. There is no other way you can login to other site while screen scraping.

Resources