Shibboleth Authentication for ProQuest's Chadwyck-Healey Products

Geoff Leach, Software Team Leader, ProQuest (Cambridge, UK) - October 2007

Introduction

The various Chadwyck-Healey branded products are searchable databases covering a variety of fields including scholarly journals, historical papers, literature, newspaper archives and reference works.

Each product is a separate website in the 'chadwyck.co.uk' or 'chadwyck.com' domain. For example:

In these products the 'federated login' link points to the 'shibboleth.chadwyck.co.uk' website which implements the Shibboleth 1.3 software, consisting of a Shibboleth module in the Apache 2.2 HTTP server and the 'shibd' daemon.

Authentication Script: 'authenticate.cgi'

The custom software that completes the authentication process is a Perl script 'authenticate.cgi'. This is located in the subdirectory 'htdocs/secure' of the Apache HTTP server, so that access to the script is protected by Shibboleth.

When the user attempts to access this script the Shibboleth authentication process begins, first redirecting the user to the UK Federation's "Where Are You From" (WAYF) service, and ultimately returning the user to the 'shibboleth.chadwyck.co.uk' server, where the user's attributes are retrieved from the identity provider. All of these stages are performed by the Shibboleth software itself.

At the end of this process, when the authentication script finally runs, the information from the Shibboleth authentication will have been placed in environment variables: including the attributes which are used to determine whether the user is authorised to access a particular product, such as eduPersonScopedAffiliation and eduPersonEntitlement.

The script interprets the contents of these environment variables and attempts to match them to a ProQuest customer account.

If this matching is successful then the user is redirected to a URL in the product which will log the user in under that institution's account, and then will forward the user to the page in the product which was originally requested.

Invocation Parameters for the Authentication Script

Suppose that a user attempts to access the main search page of House of Commons Parliamentary Papers (HCPP) and, upon being presented with the login page, chooses the link for a federated login via Shibboleth.

This link will invoke the authentication script via a URL such as:

     https://shibboleth.chadwyck.co.uk/secure/authenticate.cgi?
         product=HCPP&
         location=UK&
         returnpage=http://parlipapers.chadwyck.co.uk/shibbolethLogin.do&
         forward=/search/search.jsp

where the parameters are:

  • product - identifies the product which the user is attempting to access.
  • location - identifies whether the product is located on the UK or US server. This determines which server's customer database is consulted in order to match the Shibboleth attributes to a customer account.
  • returnpage - the URL in the product which will process the results of the Shibboleth authentication. If the user was successfully matched to a customer account this will log the user in under that account; otherwise it displays an error page.
  • forward - the URL in the product which the user was originally trying to access (such as the main search page). The user is forwarded on to this page after being logged in.

Matching Shibboleth Attributes to a Customer

The authentication script needs details of the ProQuest customer accounts that use Shibboleth, so that it can match a user's attributes to an institution's account.

At present, while there are only dozens or hundreds of accounts which use Shibboleth, these details are supplied as a flat file containing an extract from the Chadwyck-Healey customer database (referred to as 'Webtools' in the script).

This file is read every time an authentication is performed. Currently this has a minimal impact upon performance, but testing indicates that it will take up to 5 seconds to process this file once it contains 2000 accounts, at which point a more sophisticated approach will be needed.

The extract contains four fields for each account (the identifying code and name for the customer, the rules for matching Shibboleth attributes to the account, and the list of products to which the account is subscribed). The following table shows some examples:

Client codeClient nameShibboleth matching rulesProduct codes
camtestProQuest Cambridge test accountaffiliation="student|staff|faculty|employee|member" && scope="proquest.co.uk" || entitlement="https://shibboleth.chadwyck.co.uk/test-entitlement.html" && product="HCPP|PIO|PAO"EEBO HCPP LION PAO PIO
lonschecoLondon School of Economicsaffiliation="student|staff|faculty|employee|member" && scope="lse.ac.uk"HCPP PAO PIO
ucambridgeUniversity of Cambridgeaffiliation="member" && scope="cam.ac.uk"HCPP LION

The rules for how Shibboleth attributes are matched against an account are encoded as a boolean expression, with alternative rules separated by '||' (OR), and with the terms in each rule separated by '&&' (AND).

Rule for matching eduPersonScopedAffiliation

The usual rule is of the form:

affiliation="student|staff|faculty|employee|member" && scope="proquest.co.uk"

which matches the eduPersonScopedAffiliation attribute (which consists of two parts separated by an '@' symbol, as in 'employee@proquest.co.uk').

The rule specifies the acceptable values for the initial 'affiliation' part (as a list of alternatives separated by vertical bars) and the value for the final 'scope' part (which will be a DNS domain registered to the institution).

Rule for matching eduPersonEntitlement

More rarely, a rule may be of the form:

identityprovider="https://typekey.sdss.ac.uk/shibboleth" && entitlement="https://shibboleth.chadwyck.co.uk/test-entitlement.html" && product="HCPP|PIO|PAO"

which matches the eduPersonEntitlement attribute, containing an identifying URL for the entitlement. This URL would resolve to a document, such as a legal contract, which describes the scope of the entitlement.

The 'entitlement' term of the rule specifies the value for this attribute, and the 'product' term lists the products to which the entitlement applies (so as to ensure that the user is not inadvertently granted access to all the products to which the institution has subscribed).

The 'identityprovider' term matches the Provider ID of the customer's identity provider. This term ensures that we only accept the entitlement when received from this particular identity provider. It prevents a malicious third party from gaining access by running an identity provider which asserts arbitrary entitlements.

Preventing Multiple Matches

Many institutions have a hierarchical structure and it is possible for products to be sold at different levels of an organisation. A university might subscribe to certain products that are of interest to all its users, whereas more specialised products might be limited to a particular campus, college or department.

If the Shibboleth identity provider returns sufficient information about a user's affiliations, the authentication script can distinguish the correct account to match against for a particular product.

As an entirely hypothetical and wholly untrue example, suppose that the University of Cambridge subscribes to PAO, and Trinity College Cambridge subscribes to HCPP.

These would result in two customer accounts, which might appear as follows:

Client codeClient nameShibboleth matching rulesProduct codes
ucambridgeUniversity of Cambridgeaffiliation="member" && scope="cam.ac.uk"PAO
trinitycamTrinity College (University of Cambridge)affiliation="member" && scope="trinity.cam.ac.uk"HCPP

Suppose that a student at Trinity College then attempts to login to either HCPP or PAO using the Cambridge University Computing Service. In either case the University of Cambridge's identity provider returns the same value for the eduPersonScopedAffiliation.

If this value includes the affiliation to both the university and to the college, as in:

     eduPersonScopedAffiliation = 'member@cam.ac.uk;member@trinity.cam.ac.uk'

then the user matches both the 'ucambridge' and 'trinitycam' accounts in the customer database.

However, depending on whether the user is trying to access PAO or HCPP, the authentication script can select the single account that is subscribed to the requested product.

Identity providers do not typically release this level of detail about a user, but the authentication script has been written to account for this possibility.

Shibboleth Test Page

To assist in software testing, setting up new customer accounts, and resolving access management problems, there is a Shibboleth test page.

This is accessed by adding an extra parameter 'testmode=Y' to the URL for the authentication script. The test page is then displayed at the end of the Shibboleth authentication sequence, instead of redirecting the user back into the product.

The page includes a diagnostic report of the Shibboleth authentication process, including the attributes retrieved from the identity provider, and how these attributes were matched to any ProQuest customer account. At the bottom of the page is a form for emailing the report to ProQuest's technical support team.

The HTML template for the test page is in the file 'test_page_template.html', which contains text such as 'XXX_REPORTHTML_XXX' that is replaced by the text of the report.

A CGI script 'cgi-bin/test_page_send_email.cgi' is needed for emailing the report back to ProQuest.

Example Report from the Test Page

The following example shows the result of a successful authentication, obtained during testing by staff at the London School of Economics.


 SHIBBOLETH AUTHENTICATION - DIAGNOSTIC REPORT

 Authentication performed at: Thu Oct 25 09:32:18 2007

 Parameters supplied to authentication script:
     location = 'UK'
     product = 'HCPP'
     testmode = 'Y'

 Environment variables containing Shibboleth attributes are:
     HTTP_SHIB_APPLICATION_ID = 'default'
     HTTP_SHIB_AUTHENTICATION_METHOD = 'urn:oasis:names:tc:SAML:1.0:am:unspecified'
     HTTP_SHIB_EP_AFFILIATION = 'MEMBER@lse.ac.uk;EMPLOYEE@lse.ac.uk'
     HTTP_SHIB_EP_ENTITLEMENT = 'urn:mace:InCommon:entitlement:common:1'
     HTTP_SHIB_EP_UNSCOPEDAFFILIATION = 'MEMBER;EMPLOYEE'
     HTTP_SHIB_IDENTITY_PROVIDER = 'https://lse.ac.uk/idp'
     HTTP_SHIB_ORIGIN_SITE = 'https://lse.ac.uk/idp'
     HTTP_SHIB_TARGETEDID = 'aKxwrUUdX/8uNVxD02Lbw/OWatE=@lse.ac.uk'

 Obtained 2 value(s) for scoped affiliation:
     MEMBER@lse.ac.uk
     EMPLOYEE@lse.ac.uk

 Obtained 1 value(s) for entitlement:
     urn:mace:InCommon:entitlement:common:1

 These Shibboleth attributes match exactly one customer account in the UK database:

 ------------------------------------------------------------------------
 Client code           : lonscheco
 Client name           : London School of Economics (British Library of Political 
& Economic Science) Shibboleth rule list : affiliation="student|staff|faculty|employee|member"
&& scope="lse.ac.uk" Subscribed products : ESO, HCPP, KNOWUK, PAO, PIO, STATS Matches user via rule : affiliation="member" && scope="lse.ac.uk" Matches user via rule : affiliation="employee" && scope="lse.ac.uk" ------------------------------------------------------------------------ AUTHENTICATION SUCCEEDED - USER AUTHENTICATED AS: lonscheco