Ad Blocker w/ Segment.io - analytics

I'm considering using segment.io for several of my client-side 3rd party API needs, but I'm a little concerned about ad-blockers.
My app has no ads, but I do a lot of event-tracking for product analytics, as well as error tracking.
Segment.io offers a nice all-in-one solution, but if it's blocked, and all my eggs are in that basket, then, well, I won't have any eggs left, or however that idiom ends.
So my question is: is there a way to integrate multi-purpose event tracking (segment.io, keen.io, etc.) that isn't as susceptible to ad-blocking?
My app is mostly serverless, using a Firebase+AWS Lambda setup, so I've tried to think of some kind of back-end solution, but no big ideas so far.
ETA: I'm not looking to track adblocking users or violate anyone's trust. my question is about event-tracking unrelated to a user's identity, and whether or not that's possible in an all-in-one event tracking library that might be ad-blocked.

First, I'd typically consider such blocking to be "privacy" blocking instead of ads. So instead of Adblock it's more likely to be Ghostery or uBlock Origin.
Although most website uses of analytics are benign (improving conversion rates, catching browser exceptions, etc), the concern many have is that it allows the third party analytics sites (including segment, etc) to track users across multiple websites. Now most of these analytics sites are also not interested in that, but better safe than sorry?
The ethics of wanting to have analytics about all your webapp use are far more nuanced than "privacy good, tracking bad" and I don't think this is the forum for it, so I'll provide you a technical answer. Just note that your disclaimer about not wanting to "track adblocking users" is not really valid. If your aim is to gather analytics about them, that's still essentially tracking. Otherwise just use a hosted solution and realise that maybe 10-20% of users don't provide you with analytics.
The bad news: basically every "hosted" analytics solution is or will be in the block lists. Not only are their API hosts directly blocked, but there are also blocks in placed based on the name of JS files you may try to include.
The good news: you can work around it if you relay events through your own API, and AWS API Gateway which you may already be using is perfect for this.
There are multiple steps to this.
Step 1: The analytics provider need to provide the option of a fully bundled/built JS file. If they require you to pull the script dynamically from their own servers then it will be blocked there before it even downloads.
Step 2: Rename the bundled script so that it doesn't trigger any filename-based blocks, e.g. rename from mixpanel.umd.js to mp.js, and add it to your server.
Step 3: Create an API gateway to relay events back to the "correct" API (e.g. to api.analyticshost.com). You can actually do this with AWS API gateway only (no lambda required) if you pass through the right headers and URL params.
Step 4: Initialise the library to use your API host rather than the default one.
The result of this is (a) the browser no longer needs to dynamically pull the analytics from the analytics provider's CDN, and instead gets it from your server, and (b) the browser sends it to your API and then relayed through to the analytics provider's.

When gathering analytics segment also provides server side tracking libraries. This can be quite useful when you want to gather metrics for certain types of events that might be blocked by users on the client. At it's simplest, Segment has an HTTP Source but there are a number of popular languages supported as well.
https://segment.com/docs/connections/sources/catalog/#server
The classic example is the order complete event, this would typically happen in your server once that transaction has been committed to a database. Regardless of browser configuration, you could send this event from the server and track.
Be sure you respect the users consent management settings here though.

A lot of valid points are already mentioned in the accepted answer, I would mention a few technical considerations to minimize ad blockers impact on tracking tools (Segment, Google Tag Manager, etc):
Develop for server-side tracking. Whatever is on server cannot be blocked by ad blockers. However, this is usually tricky and very custom, Segment gives some examples on it as well.
Use managed client-side proxy solutions like DataUnlocker. This is great and does not require many code changes.
Use self-hosted open-source solutions for proxying Google Analytics and Google Tag Manager like this or this. I believe these solutions can be extended to support Segment as well.

Related

Vaadin on Google Cloud: Production Deployment

I've gone through https://vaadin.com/learn/tutorials/cloud-deployment/google to learn about how to deploy a Vaadin application on GCP.
Now, when I dive into the details, I see that Cloud Run doesn't support Session Affinity, and although Google App Engine does support it, the documentation says "You should never use session affinity to build stateful applications." So basically that means that what is suggested in the tutorial is not really working for production use.
So, my question is, what is the recommended approach to run Vaadin application on Google Cloud for production deployments?
I read that a distributed session store is also not on option (https://vaadin.com/blog/session-replication-in-the-world-of-vaadin).
Thank you,
Kristof.
Using any Vaadin (except Fusion maybe) relies on heavy use of server side stored state in a session (a massive scene graph is stored per client) -- there is no way around it. If your environment or use-case can not cope with this, then you are better off not using Vaadin. Due to the size of the session data, using a distributed session store without session affinity is discouraged (and the linked blog shows why).
TL;DR: there are no silver bullets.
A late response but could be useful for others.
Vaadin is not really cloud friendly.
There are demo on the vaadin site about server side stored session with hazelcast ... but after the first 'great it works' and a few days of effort and research you realise
it is not really more than a hello world and others developers having tried this way faced multiple blocking issues with no answers
this server side stored session endup with something much too heavy for a large scale deployment.
as cfrick mentioned, this required anyway session affinity for performance reasons
So my view is : forget it, sticky session is the only way with vaadin.
Now sticky session is not so straight forward to setup first especially when using push then to go production, you also need to manage things like "app ready" and "app gone" ... and up to now, I've not seen any production grade response.

Why do we use REST to connect to a database on a mobile app?

I am currently studying how to make cross-platform mobile apps (with xamarin forms), and I have heard that the "correct" way to connect to a database in a non-locale server (in my case located in Azure) is by using Rest Services (or rest APIs, or however is called), instead of connecting directly to the database with the server explorer option of VS like you would do in windows forms for example(using the SQL connection, dataset, etc. Which I think they are not necessary in the first case, I am not sure).
The only answer that I have received about this is that in mobile apps "They are not permanent connections. It connects, gives you data and disconnects. They are Asynchronous connections.", and that this is done "For optimization of connection resources. The mobile is suspended or the user passes the App to the background.".
But I still don't know if this is the actual reason, and if it is I don't understand how it optimizes the connection resources. So if someone has time to explain this I would appreciate it.
Thank you for your time, I hope I have explained myself correctly, and that you all have a great day.
As Jason said,the Security issues,with proper authorization having mediator is definitely much more secure than giving a user direct access to the database, because you restrict him to the end points which run only the queries you want to.And from the platform independence and maintenance,if the apps are developed in different languages and on different platforms,it may have benefit to create a common REST interface to allow sharing of data model, caching etc.For performance and scalability,that HTTP layer of your REST API provides another valuable caching mechanism. Your servers for your REST API can put caching headers on their responses, and these responses can be cached at the network layer, which scales exceptionally well.
you could read this link Why do people do REST API's instead of DBAL's?,I think the answers are pretty good

Is it possible to get data from other companies' databases?

I was wondering how so many job sites have so many job offers/information regarding other companies' offers. For instance, if I were to start my own job searching engine, how would I be able to get the information that sites like indeed.com have in my own databases? One site (jobmaps.us) says that it's "powered by indeed" and seems to be following the same format as indeed.com (as do all other job searching websites). Is there some universal job searching template that I can use?
Thanks in advance.
Some services offer an API which allows you to "federate" searches (relay them to multiple data sources, then gather all the results together for display in one place). Alternatively, some offer a mechanism that would allow you to download/retrieve data, so you can load it into your own search index.
The latter approach is usually faster and gives you total control but requires you to maintain a search index and track when data items are updated/added/deleted on remote systems. That's not always trivial.
In either case, some APIs will be open/free and some will require registration and/or a license. Most will have rate limits. It's all down to whoever owns the data.
It's possible to emulate a user browsing a website, sending HTTP requests and analysing the response from a web server. By knowing the structure of the HTML, it's possible to extract ("scrape") the information you need.
This approach is often against site policies and is likely to get you blocked. If you do go for this approach, ensure that you respect any robots.txt policies to avoid being blacklisted.

Design: Does it make sense to separate the "application server" from the "web server"?

I'd like to build an "application" that'll be consumed by:
Users who go to a website with their browsers to use it.
Integrators who use HTTPS/REST API to interface with it.
Users who run it on their mobile device with a native app.
Putting load balancing or database issues aside for a moment, my initial thought was to architect it with these high level back-end components:
"Application Server" to support both external API for 3rd API integrators (#2 on the top list) and internal API to be consumed by Mobile applications (#3 on the top list) and a single page web application. Let's call this server "app.myapplication.net"
"Website Server" - to support the my company's public website and serve the AngularJS web application pages that make use of the internal API's in "Application Server" to fetch data back an forth. Let's call it "www.myapplication.com"
My motivation is complete separation of front end from back end work.
Is this a popular way to architect this? Does it make sense?
This is indeed the most common method. However, for smaller projects, you likely can start off having them colocated and split later.
There are several advantages of splitting the static content from the application resources. On projects that I have worked on, the pros are resources, release management and to a lesser degree, browser limitations.
Cookies are expensive; If you colocate the static and application content, every little request will have be burdened with carrying cookies. Cookies also means that the browser will consider that all requests to the server might have different responses for different users, so that may defeat caching of resources. If you serve the static from a domain without cookies, you can ensure that caching works properly.
Resource usage. You are likely already serving static content from other services, angularjs, that you referred to, you likely refer to by another domain name. Older browsers likes this since they limit the number of concurrent requests to a single web server to 2. It is therefore common to split resources across multiple domains, "domain sharding". This allows the browser to work in parallel.
Release Management. Now this depends on the team(s) that will be working on the application, but we typically want the application and front end javascripts to be separated, so that front end programmers and designers can release the static content separately without having to trigger a release of the applications.
Performance. Depending on the application and server, you may also find that you can achieve higher performance if it does not have to scan for .htaccess and similar.
Content Delivery Networks provides you with primarily a distribution of static content, so that your customers can reach heavy files in a location closer, with lower latency. You can certainly move to a CDN later on and is probably not needed from day one, but if you have started out with your application being on two separate domains, you might find this exercise being easier when/if you decide to do this.

AngularJS + Breeze + Security

We are trying to use AngularJS with Breeze, with a .NET backend. We have gotten things hooked up working together. However, we are having trouble figuring out how to lock things down based on the user role and the user's own data.
Can anyone point us in the general direction? We couldn't find anything explicitly in Breeze's documention.
There is no reason why Breeze should be insecure. Security is orthogonal. My question remains: what are your concerns?
Update 2 March 2015
Thanks for the clarifying comment ... which reflects concerns that are widely share. I really am going to have to write about this at length in our documentation.
Fortunately, I believe I can ease your mind about the specific issues you raised.
BreezeJS, the client library, can only reach the data that your server allows the current user to access. It's the server's job to grant or refuse such requests.
This is fundamentally the same story for a client written with any technology talking to a server written with any technology. If the server has a "Customers" endpoint, than a client can request all of your customers and will receive them unless you guard that endpoint with logic on the server. This is true with or without Breeze.
You may be thinking that the metadata describes your entire database schema and therefore exposes the entire database to Breeze client requests. That statement is not true on a couple of counts.
First, even if the client knows about your entire database schema, it can't do anything with that knowledge unless you go to the trouble of exposing every table in your web api with unguarded endpoints. This is entirely within your control and its not something you can do by accident.
Second, there is no good reason to send metadata that describe your entire database. If you let the server generate the metadata based on the Entity Framework model, you can easily limit the size and shape of that model to the subset of the database that you want to expose in your client-facing api.
After you've narrowed the model and the web api to the size and shape appropriate for your application, you must take the next step ... the step you'd take for any web api imaginable ... which is to guard the endpoints.
At a minimum that means ensuring that the user is authenticated and authorized to make requests of each endpoint. But it also means preventing unwanted responses even to authorized user requests. For example, you might want to limit on the server the number of Customers that can be returned for any given customer query. You might want to throttle the number of requests that you'll process in a single interval of time. You might want to filter the customers down to just those few that the user is allowed to see.
The techniques for doing these things are all part of the ASP.NET Web API itself, having nothing to do with Breeze whatsoever. You'll want to investigate the options that Web API has to offer.
The update side of things is actually much easier to manage with Breeze components for ASP.NET Web API. The conventional Breeze update mechanism is a batch post to a single SaveChanges endpoint. In other words, the surface area of the attack can be limited to a single endpoint. The Breeze SaveChanges method for .NET offers two interception points for security and integrity checks:
BeforeSaveEntity where you can inspect and confirm every entity individually before it gets saved to the database.
BeforeSaveEntities where you can inspect the entire batch as a whole ... to ensure that the save request is cohesive and legitimate. This is something you can't do in a simple REST-ish api where PUT/POST/DELETE requests arrive as separate, autonomous events
The Breeze query language is highly expressive so it is possible that the client may query the server for something that you were not expecting. The expand clause is probably the most "dangerous" in this regard. Someone can query the Customer endpoint and get their related Orders, their related OrderDetails, the related Products, etc ... all at the same time.
That is a great power and with it comes responsibility. You may choose to withhold that power by refusing to allow expand queries. You can refuse select queries that can "expand" by selecting related entities. The ASP.NET Web API makes it easy to impose these restriction.
Alternatively, you can allow expand in some cases and not others. Or you can inspect the query request within the GET endpoint's implementation and refuse it if it fails your security checks.
You could decide that you don't want certain entity types to be "queryable" at all. You can create just the specialized GET endpoints you need to support safe access to those highly sensitive types. If the supporting methods don't return IQueryable, neither Breeze nor Web API will attempt to turn the OData query parameters into LINQ queries. These endpoints look and behave exactly like the traditional REST-ish apis that are familiar to you now. And the Breeze client will be happy to play along. When you compose a Breeze query, the client doesn't know whether the server will honor that request. With Breeze you can compose any request you want and send it to any HTTP endpoint you desire. You're not limited to OData-style queries.
You don't need ONE approach for querying. You decide what entity types are exposed, how, and under what conditions. You can and should write the guard logic to ensure that the proper data are returned by a query or updated by a save ... just as you would for any web api. Both Breeze and Web API give you a rich set of tools for writing such guard logic. Your options are unbounded.
Finally, I observe that Breeze-oriented apis tend to be much smaller than the typical RESTy api ... that is, they offer fewer endpoints and (in this sense) a smaller surface area. As a practical matter, that means you can concentrate your server-side security budget on fewer methods and potentially improve both the quality of that code and your capacity to scrutinize your api's security risks.

Resources