How can I scrape data from a website protected with Shibboleth? - screen-scraping

I am attempting to scrape data from one of my University's websites, which uses Shibboleth as a form of authentication/protection. However, I am having difficulty determining the best way to get past it and to the page I wish to scrape. I have valid credentials, which I could use to log in with. Does anyone have any suggestions for how to accomplish this task?

I have been working on scripting Shibbolized login with success ( in my case, to monitor the health of both the Shibboleth IdP and the applications it protects).
I am using Python's urllib module and their classes to handle the redirect following and cookie passing (for Shibboleth) and login form posting. After a little bit of tinkering urllib gets you most of the way to success with Shibbolized login. You could use this approach to handle the initial login to the Shibbolized website and then handle the scraping with a straight forward use of Python's urllib.
Example Python script for logging into Shibboleth

You could use Mechanize to submit forms and login to the website: http://wwwsearch.sourceforge.net/mechanize/

I believe that ECP profile was design to access Shibboleth protected resources by non-browser client (i.e. command line)
Try one of sample clients available on Shibboleth wiki page I linked above

You can also try Apache JMeter, just record your actions, make some scripting (well it is not so easy in terms of shibboleth), and you can access this pages automatically.
[Edit - better solution]
I believe that on Shibboleth Documentation pages are scripts for Grinder (another load testing tool). This test plans where in fact Python (ok Jython) scripts which should be quite easily modified and used for your purposes

Very late reply, but you could use Facebook Webdriver to do a login and scrape after you're authenticated.

Related

Possible to access Piwik getVisitorLog through HTTP API?

I'm building some reporting tool. Ideally I want to avoid going through web server logs myself and use (some of) the power of Piwik.
The stuff I get from the visitor log would be a good start, this is at http://example.com/piwik/index.php?module=CoreHome&action=index&idSite=1&period=day&date=yesterday#/module=Live&action=getVisitorLog&idSite=1&period=day&date=yesterday
Unfortunately I can't find a getVisitorLog action in the HTTP API docs at
http://developer.piwik.org/api-reference/reporting-api#Actions (and it's also not an undocumented feature, method=Actions.getVisitorLog gives me
The method 'getVisitorLog' does not exist or is not available in the module '\Piwik\Plugins\Actions\API'.
Is there another way to get to this? Or should I write a plugin for Piwik?
Apparently it is possible through the Live plugin API:
http://developer.piwik.org/api-reference/reporting-api#Live
This works as desired:
http://example.com/piwik/index.php?module=API&method=Live.getLastVisitsDetails&format=JSON&idSite=1&period=day&date=2015-07-21&expanded=1&token_auth=XXXXX&filter_limit=100

Authentication with AngularJS - NodeJS

I cant figure out how to do a simple Authentication in my AngularJS app.
What would be the best and easiest way to do a normal server side authentication with my Setup:
Yeoman angular generator, running grunt server on :9000.
Does anyone have a good tutorial? or any tips?
Another question, what is the simplest way to store data with this setup? using MongoDB?
Thank you :)
AngularJS is a front-end JavaScript framework, you can use anything of your choice, loving and knowing at the back-end for your application. This question was something like you are asking "I am using HTML5, what should I use at my back-end?" Angluar can be used with many server-side languages, viz. Ruby, Node, PHP.
There is an awesome tutorial talking about Ruby on Rails + Angular by David Bryant Copeland.
If you want to use PHP, you could use any framework which comforts you, there are many
available. CodeIgniter is one of the popular PHP framework.
If you want to use Node for your application, Passport.js could
be something of interest. MEAN Stack is the new thing which is coming up, MEAN stands for MongoDB + Express.js + Angular.js + Node.js. There is a ready Yoeman generator for MEAN stack available.
Again depending upon the requirement you should choose between SQL or NoSQL database. Also depends upto certain extend on the choice of the server-side language.
If you need a scalable database which stores hierarchical data, NoSQL should be your choice. MongoDB is a popular NoSQL database; CouchDB, RethinkDB are other alternatives.
SQL database are used where application needs high transaction. Though we can use NoSQL database for transaction based application, but it is not stable in comparision to SQL databases. MySQL is the most commonly used SQL database.
Angularjs is really not involved in the authentication process.
The basic flow of authentication is quite simple:
You make a post request from your 'Angularjs app' (your client side application) to your server, passing as parameters a pair of username/password.
Then, on your nodejs application (server side) you query your database looking for the username provided and you try to match the password that was sent within the post request with the password you have stored in your database related to that user. If they matches, you set up a cookie on user's client which is related to a session stored in your database.
To make this simpler there are some libraries that help you.
You could use passport.js (http://passportjs.org/) together with express (http://expressjs.com/). Passport will help you setting up with ease the authentication process in a safe way and express will give you tools like the cookie parser, support for session and other tools you will need to use. Express will help you also setting an endpoint where you will post the request for logging in to.
Last thing, for storing data, you can use any database you want.
Nodejs has a relevant number of third party libraries that help interfacing with databases. If you want to use mongodb, this library (https://github.com/mongodb/node-mongodb-native) will make your life easy.
Your best bet would be to use the angular-fullstack generator as it comes with basic authentication -- both local and oAuth -- built in. From there, you can either just use the authentication that is setup for you or you can use it as a reference to help you figure out how its all working.
The easiest setup I am using is:
NodeJS / Express / Passport / Socket.io / MongoDB (can be anything else actually)
AngularJS
Express is handling all the security with Passport (there's a passport method being added to express req automatically that gives you an ability to check whether user is authenticated or not). It's called isAuthenticated and you can use it like below:
function loggedOnly(req, res, next) {
if (req.isAuthenticated()) {
next();
} else {
res.redirect('/');
}
}
Route / & /login are public, but when user is logged in, they redirect to /app.
Route /app is Angular app which is private and redirects to /login when user is not authenticated.
app.get('/', anonOnly);
Socket.io has passport.socketio middleware for automatically refusing unauthorised connections that may occur.
Then, I access user object by doing the following:
io.on('connection', function(socket) {
var user = socket.conn.request.user;
});
To sum up, login logic is handled by static Express views. Moreover, Express prints few constant's to Angular app containing e.g. userData. Quite useful for displaying e.g. userName at some point.
Although it may take some time to set that up, it's definitely worth doing if you want to understand the whole logic of your app. I've given you the names of open-source packages to use, all the details can be found in their readme & guides (lots of them already exists if you google for a while).

Add vendors in quickbooks from my salesforce application?

From my salesforce application, I need to connect to quickbook api and create vendors by a batch job.
For this Do I need to add whole Oauth process (add 'Connect to quickbooks' button and there will be a auth page, which when successfull will redirect me to the application).
Or there are other ways in which I can do this.
Can I use connection ticket. If yes, the how ?
Reall stuck here. Any help is appreciated.
Thanks,
You can have a look at IPP's docs. - https://developer.intuit.com/docs/0025_quickbooksapi
The only way to make a call to QBO endpoints is through 3-legged OAuth (using consumerKey, consumerSecret, accessKey and accessSecret). If you have desktop application then you need to have a web component/embedded browser for the first time users. Once you have the end-user tokens, you can store and reuse those for all future API calls.
If you create an app in appcenter, you'll get consumerKey and consumerSecret.
https://developer.intuit.com/Application/Create/IA
Using the above two tokens, you can generate accessToken and accessSecret using the OAuthPlayground.
https://appcenter.intuit.com/Playground/OAuth/IA
Devkit Download link - https://developer.intuit.com/docs/0025_quickbooksapi/0055_devkits
You need to plugin the above 4 tokens with java devkit code to make any QBO V3 REST call.
https://developer.intuit.com/docs/0025_quickbooksapi/0055_devkits/0201_ipp_java_devkit_3.0/0001_synchronous_calls/0001_data_service_apis
Re - Can I use connection ticket. If yes, the how ?
No, OAuth is the only process here.
Vendor API doc - https://developer.intuit.com/docs/0025_quickbooksapi/0050_data_services/030_entity_services_reference/vendor
Hope it will be useful.
Thanks

Recommended authentication UX in AngularJS SPA with own and external (Google, FB...) profiles

I'm developing an Asp.net MVC + Web API + AngularJS SPA. I would like to have several types of registration/authentication:
own profile provider
external providers ie Google, FB etc.
Possible scenarios
As I'm having an SPA it would be best if I could keep my user on my page while external (or internal for that matter) would be taking place. I'd display a modal layer with particular content loaded (maybe even inside an iframe). Can this be done? Online examples?
Have login/registration capability implemented as usual Asp.net MVC full page reload controller/views and then redirect back to my SPA when that is successful. Also redirect to external provider if users wanted to authenticate/register using external provider.
Any other possibility?
Questions
How did you do this similar scenario in your SPA or how would you recommend to do it?
Should I be using particular authentication patterns regarding this - for instance provide my internal authentication/registration similar to external one so SAP would always behave in the same way
I will also have to authenticate my Web API calls subsequently after user athenticated themselves in the SPA. Any guidance on that?
I can only comment on my own experience, maybe it is helpful. We use the same stack as you, Asp.net MVC + Web API + AngularJS. We use server-side MVC for authentication (Microsoft.AspNet.Identity), since we are not exposing a public API at this stage and the only consumer of the API will be our SPA this works perfectly with the least amount of effort.
This also enables us to set a UserContext Angular service on the server once logged in that can be shared through your entire Angular app, the Google Doubleclick Manager guys goes into some of the benefits of this approach during there ng-conf presentation. Since Web Api supports Asp.Net Identity, authentication and authorization works seamlessly between MVC and Web Api.
To sum up the major pros and cons:
Pros:
Very easy and quick to implement.
Works across MVC and Web Api.
Clientside code does not need to be concerned with authentication code.
Set UserContext Angular service on server side once during login, easily shared throughout SPA using Angular DI. See presentation as mentioned above.
Integrates with external providers as easily as you would with any normal MVC app.
Cons:
Since the browser does not send the hash # part of the URL to the server, return URL on login will always be the root of your SPA. E.g. suppose your SPA root is /app, and you try to access /app#/client when you aren't authenticated, you will be redirected to the login page, but the return URL will be /app and not /app#/client as the server has no way to know the hash part of the URL as the browser never sends this.
Not really supported design if you plan to make your your Web Api available outside your SPA. Imagine a console app trying to connect to your API?
So in short, the MVC view that we use to bootstrap our SPA is protected with [Authorize] as well as our Web Api methods. Inside the MVC view we also initialize our UserContext Angular service using Razor to inject whatever user properties we want to expose. Once the SPA is loaded via the single Razor view, everything else is handled via Angular.
We have used what Beyers described before and it works well for most apps, and I use it frequently.
In our current application we are working on the premise that separation of concern should apply to route management.
Normal lifecycle:
User goes to www.server.com
Server sends down index.html
Client makes request for minified assets (.js, .css., etc.)
Angular loads -- a directive removes the loading class from the body (revealing the login section)
The Angular LoginCtrl makes an autologin attempt. (Login and Autologin in an Angular service).
The server returns a HTTP 401
The login screen remains visible.
User successfully logs in ( server gives the browser a authToken cookie; angular does not know or care)
Angular sets some isAuthenticated variables in the BodyCtrl and LoginCtrl
The login section receives a class of .hidden and the content recieves a class of .visible (insert ng-hide/show animations for fun)
User starts filling out a form, but takes an obligitory, 30 minute phone call from relative.
Server has expired his session 10 minutes ago
User finishes and submits form but the server return unauthorized (401)
http-auth-interceptor intercepts the 401 from the server, caches the submit call and publishes a "login-required' event.
The BodyCtrl listens and sets isAuthenticated = false and then the ng-class and ng-show/hide do there work on the login and content sections.
User re-signs in and 'login-confirmed' event is published
http-auth-interceptor posts cached call.
User is happy
(the content section can also display some public views as our rest api has some routes that are made public -- displaying the public views is handled by a simple function similar to isAuthenticated)
Angular Ctrl structure:
index.html
<body>
<!-- I am a fullscreen login element, z-index=2000-->
<div data-ng-controller="LoginCtrl" data-ng-hide="isAuthenticated()"</div>
<div data-ng-controller="ContentCtrl">
<!-- fullscreen class has a z-index=2001 -->
<section data-ng-view data-ng-class="{fullscreen: isViewPublic()}"></section>
<!-- header and nav go here -->
</div>
</body>
We could get a little more creative on how display the public views/routes but you get the idea. We only have a few public routes and they are mainly for registration, password resets, etc.
Disclaimer: I have yet to integrate with and oauth/external authentication services. Hopefully this setup will still hold water.
Any critique of this process is welcome.
By no means am I familiar with Microsoft backends, but still I'll give it a try ;-) :
Good resources on how the authentication/authorisation should be done in Angular-based SPA:
https://github.com/fnakstad/angular-client-side-auth
live demo: http://angular-client-side-auth.herokuapp.com/login
As you requested there are 2 methods of authenticating:
own profile
external providers.
It redirects to the provider website though :-/
NodeJS on the backend
Good ng-conf talk on how authorisation is done in Google Doubleclick Manager application: http://www.youtube.com/watch?v=62RvRQuMVyg&t=2m29s
It's not quite what you want (authentication), but the solution begins to kick in on the authentication phase. Furthermore it may be useful later and the approach Ido is presenting seems really sound.
Slides: https://docs.google.com/file/d/0B4F6Csor-S1cNThqekp4NUZCSmc/edit
Last but not least: Mastering Web Application Development with AngularJS.
A brilliant Angular book by Paweł Kozłowski and Pete Bacon Darwin.
It has a whole chapter or two dedicated to auth- stuff. It shows some complex solutions, such as retrial and session-expired interceptors. But even if you will not use approaches from the book directly, those chapters are still a must-reads since they may give you an inspiration for devising your own auth- solutions.
Remark - http-auth-interceptor: As it is mentioned in the book, the securityInterceptor solution was originally invented by Witold Szczerba. See the blog post.
http-auth-interceptor code, mentioned by #CorySilva, is actually sample code to concepts explained in the post.
btw: Those 2 chapters are great, but I hope that the Community comes up with some easier solutions in the future. Every time I read this interceptor promise api-based code I get a severe headache :)
btw2: If somebody doesn't consider oneself an Angular expert, the entire book is definetly a must-read and great complement after reading the Guide
As for building the login page with ASP - I suggest using ASP only as a backend and middleware and drawing whole the app with Angular.
You can start with your approach and switch to the pure-Angular SPA if it will begin to require more and more crazy hacks to make technologies play together nicely.
But I might be wrong and this particular case won't require applying any hacks.

Using a subdomain to identify a client

I'm working on building a Silverlight application whereas we want to be able to have a client hit a url like:
http://{client}.domain.com/
and login, where the {client} part is their business name. so for example, google's would be:
http://google.domain.com/
What I was wondering was if anyone has been able, in silverlight, to be able to use this subdomain model to make decisions on the call to the web server so that you can switch to a specific database to run a query? Unfortunately, it's something that is quite necessary for the project, as we are trying to make it easy for their employees to get their company specific information for our software.
Wouldn't it work to put the service on a specific subdomain itself, such as wcf.example.com, and then setup a cross domain policy file on the service to allow it to access it?
As long as this would work you could just load the silverlight in the proper subdomain and then pass that subdomain to your service and let it do its thing.
Some examples of this below:
Silverlight Cross Domain Services
Silverlight Cross Domain Policy Helpers
On the server side you can check the HTTP 1.1 Host header to see how the user came to your server and do the necessary customization based on that.
I think you cannot do this with Silverlight alone, I know you cannot do this without problems with Javascript, Ajax etc. . That is because a sub domain is - for security reasons - treated otherwise than a sub-page by the browsers.
What about the following idea: Insert a rewrite rule to your web server software. So if http://google.domain.com is called, the web server itself rewrites the URL to something like http://www.domain.com/google/ (or better: http://www.domain.com/customers/google/). Would that help?
Georgi:
That would help if it would be static, but alas, it's going to all be dynamic. My hope was to have 1x deployment for the application, and to use the http://google.domain.com/ idea to switch to the correct database for the user. I recall doing this once when we built an asp.net website, using the domain context to figure out what skin to use, etc.
Ates: Can you explain more about what you are saying... sounds like you are close to what I am trying to come up with. Have you seen such a tutorial for this?
The only other way I have come up with to make this work is to have a metabase that when the user logs in, it will switch them to the appropriate database as required... was just thinking as well that telling Client x to hit:
http://ClientX.domain.com/ would have been sweeter than saying to hit http://www.domain.com/ and login. It seemed as if they were to hit their name, and to show it personalized for them right from the login screen would have been much more appealing for the client base.
#Richard B: No, I can't think of any such tutorial that I've seen before. I'll try to be more verbose.
The server-side approach in more detail:
Direct *.example.com to the same IP in your DNS settings.
The backend app that handles login checks the Host HTTP header (e.g. the "HTTP_HOST" server variable in some platforms). That would contain the exact subdomain.example.com that the client used for reaching your server. Extract the subdomain part and continue...
There can also be a client-side-only approach. I don't know much about Silverlight but I'm assuming that you should be able to interface Silverlight with JavaScript. You could read document.location with JavaScript and pass it to your Silverlight applet, whereon further data fetching etc. logic would rely on the subdomain that was passed in by JavaScript.
#Ates:
That is what we did when we wrote the ASP.Net system... we pushed a slew of *.example.com hosts against the web server, and handled using the HTTP headers. The hold-up comes when dealing with WCF pushing the info between the client and the server... it can only exist in one domain...
So, for example, when you have {client}.example.com and {sandbox}.example.com, the WCF service can't be registered to both. It also cannot be registered to just *.example.com or example.com, so that's where the catch 22 is coming in at. everything else I have the prior knowledge of handling.
I recall a method by which an application can "spoof" another domain name in certain instances. I take it in this case, I would need to do such a configuration? Much to research yet I believe.

Resources