Best practices to search xacml policy on PDP side? - request

I read a lot about ABAC and its benefits, but I can't comprehend is how the involved parties to their work exactly.
I am creating a REST API microservices using C++ and I want to secure all API requests using ABAC. I do understand that I need to have: PEP, PDP, PIP etc. And I understand the general idea of what each service will do. But I have some questions about some issue that I am facing and I need to understand if there is a standard way to do it or it just depend on my imagination.
I am not going to use XACML (XML) to store the policy because my company prefers that policies be stored in a database or JSON format.
After forming the XACML request on the PEP side and send it to PDP how to search the policies stored on the PDP side with this request knowing that and if I understand it correctly not all PolicySets have targets, not all Policies have targets and the same for rules?
Do i have to use regex to match data from the request with policies from PDP. And if it is regex to be used how can i deal with policysets with no targets as i mentioned before or multiple targets in the same branch

Related

Online Secure Message Centre Design

I've got a requirement for an Online Customer Portal Secure 'Message Centre' to allow the back and front office to communicate with their customers in a two way fashion once the Customer has logged in via a secure channel.
We have procured a CMS platform with this widget presentation layer out of the box that expects to connect to an API to handle the communication and persistence i.e., the CMS is stateless.
I was wondering how people have designed and solutioned this - my current thinking:
Shoehorn it into our backend CRM system via a REST API - this would need custom dev
Use an RDBMS (custom DB data model adhering to the message structure) and build a REST API over the DB to handle the customer interaction events i.e., read, delete, new message
Build a pure microservice architecture with persistence coupled to the service - i.e., adhering to the pattern - engineering wise we don't have this capability yet
Other obvious solution that I have missed?
Am sure this has been solved multiple times over, keen to hear what works best?
*One thing I forgot to mention, is that we are migrating from an old legacy system and will need to bring about 10GB of customer messages with us i.e., historical data; this data needs to migrate into the new solution.
Many thanks
However you implement the back-end, the key here is to spend time getting your REST interfaces 'right', before doing any coding. Try to breakdown the interfaces into small specialized interfaces that service a specific business-focused responsibility. Also, think about the data model abstraction and its representation in the HTTP payload, and how to cross-reference to other data, using links embedded in the data transferred over the interface. If you get the interfaces right, then you can swap out the implementation down the line.
It is impossible to say without a deep analysis of the options what is best way to go. Unfortunately you haven't really explained the full extent of the API required or the capabilities of your existing CRM, but I am assuming there would be useful business advantages to option 1, as it integrates with your existing systems and business process. Option 2/3 would need your office staff users to use a different system, requiring training/support, which to my mind doesn't seem ideal. Option 3 requires a significant amount of work (not just coding, but integration testing, deployment, orchestration etc!), and from your description of the task, it is not clear that there really is a need to go down this route. My very high level hunch is option 1, but you will obviously need to research whether there is appropriate mapping between the API you present to the CMS and the API that is available on the CRM. Also bear in mind the security model with the CRM and of course responsiveness/throughput.

Is it possible to get data from other companies' databases?

I was wondering how so many job sites have so many job offers/information regarding other companies' offers. For instance, if I were to start my own job searching engine, how would I be able to get the information that sites like indeed.com have in my own databases? One site (jobmaps.us) says that it's "powered by indeed" and seems to be following the same format as indeed.com (as do all other job searching websites). Is there some universal job searching template that I can use?
Thanks in advance.
Some services offer an API which allows you to "federate" searches (relay them to multiple data sources, then gather all the results together for display in one place). Alternatively, some offer a mechanism that would allow you to download/retrieve data, so you can load it into your own search index.
The latter approach is usually faster and gives you total control but requires you to maintain a search index and track when data items are updated/added/deleted on remote systems. That's not always trivial.
In either case, some APIs will be open/free and some will require registration and/or a license. Most will have rate limits. It's all down to whoever owns the data.
It's possible to emulate a user browsing a website, sending HTTP requests and analysing the response from a web server. By knowing the structure of the HTML, it's possible to extract ("scrape") the information you need.
This approach is often against site policies and is likely to get you blocked. If you do go for this approach, ensure that you respect any robots.txt policies to avoid being blacklisted.

Ad Blocker w/ Segment.io

I'm considering using segment.io for several of my client-side 3rd party API needs, but I'm a little concerned about ad-blockers.
My app has no ads, but I do a lot of event-tracking for product analytics, as well as error tracking.
Segment.io offers a nice all-in-one solution, but if it's blocked, and all my eggs are in that basket, then, well, I won't have any eggs left, or however that idiom ends.
So my question is: is there a way to integrate multi-purpose event tracking (segment.io, keen.io, etc.) that isn't as susceptible to ad-blocking?
My app is mostly serverless, using a Firebase+AWS Lambda setup, so I've tried to think of some kind of back-end solution, but no big ideas so far.
ETA: I'm not looking to track adblocking users or violate anyone's trust. my question is about event-tracking unrelated to a user's identity, and whether or not that's possible in an all-in-one event tracking library that might be ad-blocked.
First, I'd typically consider such blocking to be "privacy" blocking instead of ads. So instead of Adblock it's more likely to be Ghostery or uBlock Origin.
Although most website uses of analytics are benign (improving conversion rates, catching browser exceptions, etc), the concern many have is that it allows the third party analytics sites (including segment, etc) to track users across multiple websites. Now most of these analytics sites are also not interested in that, but better safe than sorry?
The ethics of wanting to have analytics about all your webapp use are far more nuanced than "privacy good, tracking bad" and I don't think this is the forum for it, so I'll provide you a technical answer. Just note that your disclaimer about not wanting to "track adblocking users" is not really valid. If your aim is to gather analytics about them, that's still essentially tracking. Otherwise just use a hosted solution and realise that maybe 10-20% of users don't provide you with analytics.
The bad news: basically every "hosted" analytics solution is or will be in the block lists. Not only are their API hosts directly blocked, but there are also blocks in placed based on the name of JS files you may try to include.
The good news: you can work around it if you relay events through your own API, and AWS API Gateway which you may already be using is perfect for this.
There are multiple steps to this.
Step 1: The analytics provider need to provide the option of a fully bundled/built JS file. If they require you to pull the script dynamically from their own servers then it will be blocked there before it even downloads.
Step 2: Rename the bundled script so that it doesn't trigger any filename-based blocks, e.g. rename from mixpanel.umd.js to mp.js, and add it to your server.
Step 3: Create an API gateway to relay events back to the "correct" API (e.g. to api.analyticshost.com). You can actually do this with AWS API gateway only (no lambda required) if you pass through the right headers and URL params.
Step 4: Initialise the library to use your API host rather than the default one.
The result of this is (a) the browser no longer needs to dynamically pull the analytics from the analytics provider's CDN, and instead gets it from your server, and (b) the browser sends it to your API and then relayed through to the analytics provider's.
When gathering analytics segment also provides server side tracking libraries. This can be quite useful when you want to gather metrics for certain types of events that might be blocked by users on the client. At it's simplest, Segment has an HTTP Source but there are a number of popular languages supported as well.
https://segment.com/docs/connections/sources/catalog/#server
The classic example is the order complete event, this would typically happen in your server once that transaction has been committed to a database. Regardless of browser configuration, you could send this event from the server and track.
Be sure you respect the users consent management settings here though.
A lot of valid points are already mentioned in the accepted answer, I would mention a few technical considerations to minimize ad blockers impact on tracking tools (Segment, Google Tag Manager, etc):
Develop for server-side tracking. Whatever is on server cannot be blocked by ad blockers. However, this is usually tricky and very custom, Segment gives some examples on it as well.
Use managed client-side proxy solutions like DataUnlocker. This is great and does not require many code changes.
Use self-hosted open-source solutions for proxying Google Analytics and Google Tag Manager like this or this. I believe these solutions can be extended to support Segment as well.

AngularJS + Breeze + Security

We are trying to use AngularJS with Breeze, with a .NET backend. We have gotten things hooked up working together. However, we are having trouble figuring out how to lock things down based on the user role and the user's own data.
Can anyone point us in the general direction? We couldn't find anything explicitly in Breeze's documention.
There is no reason why Breeze should be insecure. Security is orthogonal. My question remains: what are your concerns?
Update 2 March 2015
Thanks for the clarifying comment ... which reflects concerns that are widely share. I really am going to have to write about this at length in our documentation.
Fortunately, I believe I can ease your mind about the specific issues you raised.
BreezeJS, the client library, can only reach the data that your server allows the current user to access. It's the server's job to grant or refuse such requests.
This is fundamentally the same story for a client written with any technology talking to a server written with any technology. If the server has a "Customers" endpoint, than a client can request all of your customers and will receive them unless you guard that endpoint with logic on the server. This is true with or without Breeze.
You may be thinking that the metadata describes your entire database schema and therefore exposes the entire database to Breeze client requests. That statement is not true on a couple of counts.
First, even if the client knows about your entire database schema, it can't do anything with that knowledge unless you go to the trouble of exposing every table in your web api with unguarded endpoints. This is entirely within your control and its not something you can do by accident.
Second, there is no good reason to send metadata that describe your entire database. If you let the server generate the metadata based on the Entity Framework model, you can easily limit the size and shape of that model to the subset of the database that you want to expose in your client-facing api.
After you've narrowed the model and the web api to the size and shape appropriate for your application, you must take the next step ... the step you'd take for any web api imaginable ... which is to guard the endpoints.
At a minimum that means ensuring that the user is authenticated and authorized to make requests of each endpoint. But it also means preventing unwanted responses even to authorized user requests. For example, you might want to limit on the server the number of Customers that can be returned for any given customer query. You might want to throttle the number of requests that you'll process in a single interval of time. You might want to filter the customers down to just those few that the user is allowed to see.
The techniques for doing these things are all part of the ASP.NET Web API itself, having nothing to do with Breeze whatsoever. You'll want to investigate the options that Web API has to offer.
The update side of things is actually much easier to manage with Breeze components for ASP.NET Web API. The conventional Breeze update mechanism is a batch post to a single SaveChanges endpoint. In other words, the surface area of the attack can be limited to a single endpoint. The Breeze SaveChanges method for .NET offers two interception points for security and integrity checks:
BeforeSaveEntity where you can inspect and confirm every entity individually before it gets saved to the database.
BeforeSaveEntities where you can inspect the entire batch as a whole ... to ensure that the save request is cohesive and legitimate. This is something you can't do in a simple REST-ish api where PUT/POST/DELETE requests arrive as separate, autonomous events
The Breeze query language is highly expressive so it is possible that the client may query the server for something that you were not expecting. The expand clause is probably the most "dangerous" in this regard. Someone can query the Customer endpoint and get their related Orders, their related OrderDetails, the related Products, etc ... all at the same time.
That is a great power and with it comes responsibility. You may choose to withhold that power by refusing to allow expand queries. You can refuse select queries that can "expand" by selecting related entities. The ASP.NET Web API makes it easy to impose these restriction.
Alternatively, you can allow expand in some cases and not others. Or you can inspect the query request within the GET endpoint's implementation and refuse it if it fails your security checks.
You could decide that you don't want certain entity types to be "queryable" at all. You can create just the specialized GET endpoints you need to support safe access to those highly sensitive types. If the supporting methods don't return IQueryable, neither Breeze nor Web API will attempt to turn the OData query parameters into LINQ queries. These endpoints look and behave exactly like the traditional REST-ish apis that are familiar to you now. And the Breeze client will be happy to play along. When you compose a Breeze query, the client doesn't know whether the server will honor that request. With Breeze you can compose any request you want and send it to any HTTP endpoint you desire. You're not limited to OData-style queries.
You don't need ONE approach for querying. You decide what entity types are exposed, how, and under what conditions. You can and should write the guard logic to ensure that the proper data are returned by a query or updated by a save ... just as you would for any web api. Both Breeze and Web API give you a rich set of tools for writing such guard logic. Your options are unbounded.
Finally, I observe that Breeze-oriented apis tend to be much smaller than the typical RESTy api ... that is, they offer fewer endpoints and (in this sense) a smaller surface area. As a practical matter, that means you can concentrate your server-side security budget on fewer methods and potentially improve both the quality of that code and your capacity to scrutinize your api's security risks.

How does Zapier/IFTTT implement the triggers and actions for different API providers?

How does Zapier/IFTTT implement the triggers and actions for different API providers? Is there any generic approach to do that, or they are implemented by individual?
I think the implementation is based on REST/Oauth, that is generic from high level to see. But for Zapier/IFTTT, it defines a lot of trigger conditions, filters. These conditions, filters should be specific to different provider. Is the corresponding implementation in individual or in generic? If in individual, there must be a vast labor force. If in generic, how to do that?
Zapier developer here - the short answer is, we implement each one!
While standards like OAuth make it easier to reuse some of the code from one API to another, there is no getting around the fact that each API has unique endpoints and unique requirements. What works for one API will not necessarily work for another. Internally, we have abstracted away as much of the process as we can into reusable bits, but there is always some work involved to add a new API.
PipeThru developer here...
There are common elements to each API which can be re-used, such as OAuth authentication, common data formats (JSON, XML, etc). Most APIs strive for a RESTful implementation. However, theory meets reality and most APIs are all over the place.
Each services offers its own endpoints and there are no commonly agreed upon set of endpoints that are correct for given services. For example, within CRM software, its not clear how a person, notes on said person, corresponding phone numbers, addresses, as well as activities should be represented. Do you provide one endpoint or several? How do you update each? Do you provide tangential records (like the company for the person) with the record or not? Each requires specific knowledge of that service as well as some data normalization.
Most of the triggers involve checking for a new record (unique id), or an updated field, most usually the last update timestamp. Most services present their timestamps in ISO 8601 format which makes parsing timestamp easy, but not everyone. Dropbox actually provides a delta API endpoint to which you can present a hash value and Dropbox will send you everything new/changed from that point. I love to see delta and/or activity endpoints in more APIs.
Bottom line, integrating each individual service does require a good amount of effort and testing.
I will point out that Zapier did implement an API for other companies to plug into their tool. Instead of Zapier implementing your API and Zapier polling you for data, you can send new/updated data to Zapier to trigger one of their Zaps. I like to think of this like webhooks on crack. This allows Zapier to support many more services without having to program each one.
I've implemented a few APIs on Zapier, so I think I can provide at least a partial answer here. If not using webhooks, Zapier will examine the API response from a service for the field with the shortest name that also includes the string "id". Changes to this field cause Zapier to trigger a task. This is based off the assumption that an id is usually incremental or random.
I've had to work around this by shifting the id value to another field and writing different values to id when it was failing to trigger, or triggering too frequently (dividing by 10 and then writing id can reduce the trigger sensitivity, for example). Ambiguity is also a problem, for example in an API response that contains fields like post_id and mesg_id.
Short answer is that the system makes an educated guess, but to get it working reliably for a specific service, you should be quite specific in your code regarding what constitutes a trigger event.

Resources