Infuriating user queries on Google's App Engine - google-app-engine

I have a web service on Google's App Engine that uses Google's user API for authenticating users, managing accounts (including premium service subscriptions), and for managing data ownership. For almost everything, it works really great.
However, very often I need to use the datastore viewer to check on a user's entry in response to a support request, and need to enter GQL to look up someone's account info. The query usually looks like this:
SELECT * FROM UserAttr WHERE user = USER('blah#fish.com')
This should work just fine, but for whatever reason, the USER constructor (?) above is case sensitive, and furthermore, sometimes has weird behavior if the user has a Google account with a gmail.com address. If it's a gmail.com address, sometimes USER('whoever#gmail.com') works, but sometimes USER('whoever') works. It's maddening that I have to try all kinds of different permutations in the GQL console to try and look things up, and I usually give up if the obvious case differences don't work.
Am I doing something totally wrong here, or is the behavior of this really this bad? Any idea if this kind of thing works better in the Python API (that is, if I do a similar request via Python, will it still exhibit this idiotic behavior?). I'd like to avoid writing my own admin pages for this app, if I can make Google's dashboard work for me.

I'm having the same problem, I figured out that google doesn't recommend storing users at the datastore at all as the email address might change:
from here:
Both the db and NDB libraries have UserProperty property types so that
applications can store user values. However, since these values become
invalid when users change email address, most applications have no
good use for this feature.
they probably also mean when the user representation changes internally.
I'll let you know if I find anything to solve this
---- EDIT ----
here
they recommend storing the user's ID for the sake of queries and comparisons. makes sense...

Related

Protecting Firestore without requiring authentication

So currently in the project we have a collection of documents that don't require authentication to be read. They are write/update protected, but everyone can read.
What we are trying to prevent is that someone looks at the firebase endpoints and somehow manages to scrape the entire collection in json format (if this is even possible). The data is public, but I want it only to be accessible from our website.
One of the solutions we could think of was SSR (we are already using Next.js), but implementing SSR just for this reason doesn't seem very enticing.
Any suggestions would be appreciated.
EDIT:
Let me rephrase a little bit.
From what you see in the network tab, is it possible to forge/create a request to Firestore and get the entire collection instead of just the 1 document that was intended?
The best solution in your case is SSR. I know, it could sound as not enticing, but well, let's reason on when we should use SSR, then. In your use case, there is an important requirement: security. I think this is already a strong enough reason to justify the usage of SSR.
Also, creating an ad hoc service account for the next.js app, and securing the data with custom rules that allow the read of your data only to that service account, would only improve the overall security level.
Last: reading the data server side should make your site work a little faster, even if it would be difficult to notice, because we are talking about milliseconds. Notice that your page, as it is now, will need to be loaded, before the request to Firebase could be sent. This is adding a small delay. If the data is loaded server side, the delay is not added.
is it possible to forge/create a request to Firestore and get the entire collection instead of just the 1 document that was intended?
If you want to limit what people can request from a collection, you're looking for security rules. The most common model there is some form of ownership-based access control or role-based access control, but both of those require some way of identifying the user. This could be anonymously (so without them entering credentials), but it'd still be a form of auth.
If you don't want to do that, you can still control how much data can be gotten through the API in one go. For example, if you in the security rules allow get but not list, the user can only request a document once they know its ID. Even if you allow list, you can control in rules what queries are allowed.
I think one approach could be writing a Cloud Function that retrieves this public data using the admin SDK. Then, you could set a rule that nobody can read those documents. This means that only your Cloud Function with the admin SDK will have access to those documents.
Finally, you could set up AppCheck for that specific Cloud Function, this way, you ensure that the request is coming from your client app only.
https://firebase.google.com/docs/app-check

Evernote users in the application database

What's the best practice or the common way of keeping (or not keeping) Evernote users in your application's database?
Should I create my own membership system and create a connection to Evernote accounts?
Should I store Evernote user data (or only part of it) in my own app and let the user log in only with Evernote?
Summary: you must protect their data but how you protect it is up to you. Use the integer edam_userId to identify data.
I think the API License agreement covers protection in the terms:
you agree that when using the API you will not, directly or indirectly, take or enable another to take any of the following actions:...
1.8.4 circumvent or modify any Keys or other security mechanism employed by Evernote or the API;
If you cache people's data and your server-based app lacks security to prevent people looking at other's data, then I think you're pretty clearly violating that clause. I think it's quite elegantly written!
Couple that with the responsibility clause 1.2
You are fully responsible for all activities that occur using your Keys, regardless of whether such activities are undertaken by you or a third party.
So if you don't protect someone's cached data and another user is able to get at it, you're explicitly liable.
Having cleared up the question of your obligations to (as you'd expect) protect people's data, the question is how do you store it?
Clause 4.3 covers identifiers pretty directly although it's a bit out of date now that we are all forced to use oAuth - there are no passwords ever entered into anything other a web view. However, mobile or desktop client apps must provide a mechanism for the user to log out, which must completely remove the username and password from your application and its persistent storage.
For a web app, you can't even save the username: If your Application runs as an Internet service on a multi-user server, you must not ask for, view, store or cache the sign-in name or password of Evernote user accounts.
The good news is that you can rely on the edam_userId value which comes back to you in the oAuth token credentials response, as discussed here.
When you look at the Data Model, you can see the unique id under the User and going into the User struct, see the reassuring definition The unique numeric identifier for the account, which will not change for the lifetime of the account.
Thinking about the consequences, as you can't get the user id until you have logged into the service, if you want to provide a local login for people you will have to link your local credentials to the user id. That may irk some people if they have to enter a username twice but can't be helped.
You can allow users to log-in via OAuth. Here's a guide on how that process works.
But you'll probably also want to store a minimal amount of user data, at least a unique identifier, in your database so you can do things like create relationships between the user and their notebooks and tags. Refer to the Evernote data model for those relationships. If you're using rails, this will also help you take advantage of rails conventions.

Google apps applications talk to each other

I am looking for a way for two Google Apps applications to talk to each other and share data between each other. I have the following scenario:
Application A logs user in using Google Apps login
Application B logs user in using Google Apps login
then these applications need to communicate directly to each other (server-to-server) using some APIs
The question is: how do these applications verify that the other one is logged in with the same user to Google? I would imagine something like:
- Application A gets some 'token' from Google and sends it to Application B
- Application B verifies that this token is valid for the same Google account as it is logged in with
Is there a way to accomplish that via Google Federated Login? I am talking about Hybrid protocol here.
Here's a simple way to do it:
You keep everything keyed to the user's Google userid on both applications.
You share the data using HTTP requests that contain the userid.
To prevent leaking of the userids (forbidden by the account API) and to verify the messages really come from the other application, you encrypt the requests with a symmetric cipher such as AES or Blowfish or whatever you like. Both applications have the same key embedded.
You could public key cryptography. With just two applications, it's not worth it in my opinion. If you start having more apps, public key makes sense.
The fine print: encryption does not guarantee integrity or origin without additional measures. You need to take precautions against playback, for example by incorporating a time-stamp or sequence number. You need to take precautions against tampering, e.g. with a checksum. Make sure to use CBC and good initialization vectors. Keep the key secret.
user.user_id() is always the same across all the apps for the same user. So you can simply compare values returned by user.user_id(). Is this what you are looking for?
Note: Every user has the same user ID
for all App Engine applications. If
your app uses the user ID in public
data, such as by including it in a URL
parameter, you should use a hash
algorithm with a "salt" value added to
obscure the ID. Exposing raw IDs could
allow someone to associate a user's
activity in one app with that in
another, or get the user's email
address by coercing the user to sign
in to another app.
From docs

Determine whether a user is a developer of a facebook app

I'm looking at ways to secure the admin section of my (cakephp powered) Facebook application. To avoid duplicating functionality, I thought it'd be neat to allow access to people who have been flagged as developers in the app settings.
The question could then be: How do I determine whether a user of my Facebook application is a developer?
Alternatively: How do I obtain an array of developer user IDs for my Facebook app?
I tried looking for your answer myself, and the only thing I found that you could possibly do is to make a group private and invite-only to developers and then use the fb:if-is-group-member tag. http://wiki.developers.facebook.com/index.php/Fb:if-is-group-member
OK, so I found out how to do it by myself. Props to Samuel for giving me the idea.
Basically, the way to do it is to run an FQL query that establishes whether a user is an admin of the applications page (page_admin).
SELECT uid FROM page_admin WHERE uid = 286302657 AND page_id = 31290624157
In the PHP client, this returns an array for developers and an empty string for anyone else.
I decided to use the FQL rather than the API call because it is possible to preload the FQL to reduce calls to the Facebook servers.
Hope this is useful to somebody.

How best to screen scrape a password protected site on behalf of a 3rd party?

I want to write a program that analyzes your fantasy baseball team and notifies you of recommended actions, possibly multiple times per day. The problem is, you aren't playing fantasy baseball on my site, you're playing on yahoo, or cbs, or espn, etc.
On the majority of these sites, fantasy teams and leagues are not public, so you must be logged in and a member of the league to see the teams in the league.
All that I need is the plain html for the team page on each of those sites to be sent to my server, where I can then parse and analyze the file and send user notifications.
The problem is that I need username/password combinations to easily get this data to my server when I need it, and I think there will be a lot of people who wouldn't want to entrust their yahoo/espn/cbs password to me.
I have come up with several possible ways to solve this problem:
The most obvious way is to ask for their credentials for the site on which their team is hosted. Then I could just programmatically log in and request the data I need. I'm guessing a number of people would be comfortable giving me their credentials, and a number of them not so much.
Write a desktop client, which the user then downloads. The client would require their credentials, but it could then basically do exactly the same thing that the server based version would do, log in, request the page, and send the page back to my server. The difference being that their password would never need to leave their desktop. Their computer would need to be on, and this program running for this method to work.
Write browser add-ons that navigate to the page I need, use the cookie that is saved from a previous login to login to the site, and send the page back to my server. This doesn't require my software to ever ask for their password, but if the cookie expires I am hosed, and I don't know much about browser add-ons besides.
I'm sure there are other options, but these are what I've come up with so far.
I have two questions:
1. What are the other possibilities for this type of task?
2. Am I over-estimating people's reluctance to give me their yahoo (for example) password? Is option (1) above the obvious choice?
It was suggested in the comments that I try yahoo pipes, and that looked like a promising suggestion so I explored it a bit. Having looked now at this, I don't think that is an option. So, it looks like I'll be going with option 1.
This is a problem I grappled with a couple of years ago when I wanted to do the same thing. Our site is http://benchcoach.com and the options we were considering were the following:
Original we considered getting the user's credentials and login. We would then log in and scrape their league and team info. The problem there is that after reading several of the various terms of service, this would definitely be violating the terms of service. On top of this, Yahoo! was definitely one of the sites we were considering and their users have email (where we could get access to sensitive data), and Yahoo! wallet. In addition, it would be pretty trivial for Yahoo/ESPN/CBS to block our programmatic logins by IP Address.
The solution we settled on (not 100% happy but it does seem to work) was asking our users to install a bookmarklet (like delicious, digg or reddit) which would post the current html page to our servers, where we could parse the data and load our database. If they were still logged into their Yahoo/ESPN/CBS account, we would direct them directly to the pages, otherwise, those sites would prompt for authentication. Clicking the bookmarklet once more, would post the page to our servers.
The pros of this approach was that we never collected anyone's credentials so any concern of security would have been alleviated. Secondly, it would make it impossible for Yahoo/ESPN/CBS to block access to our service since we would never be connecting directly to their servers but rather the user's browser would be posting the contents of their browser to our server.
The problems with this is that it takes 2 clicks to post a page to our site. For head to head leagues, we needed 3-4 pages so it would take our user 6-8 clicks to sync their league to our servers. We're still looking at options for this.
One important note is that I ran into the product manager of the Yahoo Fantasy Football site at a conference a year ago. We talked about how we were getting the Yahoo data, and he confirmed that getting credentials would violate their TOS and they may stop us. While I don't think they would have, it would have made it hard to invest time and energy to develop this only to have them block our site and pissing of users by closing their accounts.
A potentially more complicated answer could possibly be done with (for example) yahoo pipes.
Hypothetically, you create a pipe which prompts the user for their credentials and provides them with a url which contains their scraped data. They enter this URL in their site, and never have to provide their credentials directly. Even better, for the security-conscious, it would be possible to examine what the pipe was actually doing before entering any information.
The downside would be increased complexity (as well as you'd have to write and maintain the pipe). Having said that, you could provide a link directly to the published pipe from your site, to make things as easy as possible.
Option 1 is the obvious choice. People who trust your site will provide the details. There is no other way you can login to other site while screen scraping.

Resources