Common Practice for Remote Resource Authentication
- Put up an authenticated web page with passwords for your database vendors.
- Use a proxy server of some sort to pass users through to the database vendor.
- Use network-level transport tunnel: Virtual Private Networks (VPNs) / Point-to-Point Tunnelling Protocol (PPTP) / Layer-2 Tunnelling Protocol (L2TP)
- Use another form of authentication with the vendor other than IP address:
- Referrer URL
- Vendor-provided script
Overview of Proxy Servers
A proxy is a service that sits between web servers (or, more accurately, "origin web servers") and clients. This service receives requests from clients and makes requests to servers on behalf of the clients.
Reasons to use proxy servers
- Bandwidth conservation
- Statistics
- Filtering
- Remote resource access
Proxies for remote resource access
Basic theory
- Vendors use IP address limitations to restrict access to the institution's network
- Site puts a proxy server in the range of valid IP addresses for the use of users outside the network
- Just because you can do it doesn't mean it is legal
- Bandwidth considerations: traffic may pass through your Internet connection twice!
Authentication step
- This is the most difficult step. How do we allow only authorized users to access the proxy server, and consequently the remote databases?
Sources for authentication
- Integrated Library Systems
- Barcode recognition
- Flat-file username/password
- Login test against POP/IMAP servers
- LDAP directory service
- Kerberos, Netware NDS, Microsoft Networking
Transparent versus non-transparent proxy servers
- Transparent Proxy
- A proxy that passes requests and responses unmodified, except as required for proxy authentication.
- Non-transparent Proxy
- A proxy that somehow modifies the request or response to provide some added value to the client or user.
- Rewriting Proxies
- A special form of the Non-transparent Proxy server which examines the URLs in HTML documents passing through the proxy, and rewrites them to point back to the proxy server
- "http://firstsearch.oclc.org/dbname=WorldCat;graphics=low;FSIP" becomes
"http://proxy.college.edu/firstsearch/dbname=WorldCat;graphics=low;FSIP" or
"http://proxy.college.edu:2049/dbname=WorldCat;graphics=low;FSIP" or
"http://80-firstsearch.oclc.org.proxy.college.edu/dbname=WorldCat;graphics=low;FSIP"
Advantages, Disadvantages
- Transparent Proxy servers...
- ...are less computing intensive because they do not examine the content of each HTML page.
- ...are easier to program than Rewriting Proxies.
- ...require users to reconfigure their browsers (education problem).
- ...may not work with some corporate or commercial Internet Service Providers.
- Rewriting Proxy servers...
- ...require no changes to the user's browser and work with browsers on firewalled networks.
- ...sensitive to "incorrect" HTML.
- ...may not work with sites using sophisticated JavaScripts.
Selection of Proxy Servers
| Transparent Proxy servers | Rewriting Proxy servers |
- Apache
- Squid
- Microsoft Internet Security and Acceleration (ISA) Server
- Delegate
|
- EZproxy
- Libproxy
- Obvia
- Web Access Management
|
OhioLINK
Matrix of services
| OhioLINK-hosted | vendor-hosted |
Participating campus networks | IP address recognition | IP address recognition |
Unrecognized network addresses | OhioLINK Remote Authentication | OhioLINK Remote Authentication plus rewriting proxy |
Authentication system
Key points
- Authenticates users using OhioLINK-member patron databases when they come from an unknown IP address
- Stores an .ohiolink.edu domain cookie in the authenticated user's browser to serve as a security token
- Validates the security token against a database of valid cookies at each OhioLINK server
End User's Experience
- User requests services from an OhioLINK database server
- Browser is redirected to an authentication form if IP address is not recognized
- User submits credentials and is authenticated via local patron database
- Security token cookie is set in browser and browser is redirected to server from step 1
- Database server receives cookie as part of redirected HTTP request and allows access
Technical Process
- Authentication form data processed by CGI script
- Credentials checked a cache of patron records. If not found, requested from campus ILS
- Patron record added to cache
- One-way hash used to create security token cookie
- Cookie added to "valid cookies" database
- Cookie and a page with a 'META refresh' tag is sent to browser
Server Architecture
- Using Apache HTTPD for database servers
- Each database server has two virtual hosts
- IP-based authentication
- Cookie-based authentication
- Mod_rewrite is used to redirect browsers with cookies to cookie-based virtual host
- Browsers with unrecognized IP addresses are redirected to authentication server
- Custom Apache module is used for cookie validation
Policies and Privacy
- Patron record at home institution must be valid: member of the institution and not expired
- Authentication failures are tracked
- Cookies expire in two hours
- Cookies are checked to see if they are coming from the same IP address
- Cookie is not added to the patron record cache; user cannot be identified from cookie
- For auditing purposes, a one-week record of cookie-to-patron matches is kept
Proxy servers
- Both servers run on an Intel-based Linux server (2 x 1.2GHz Pentium 4's and 1GB of RAM)
Delegate
- Older of the two servers
- Rewriting plug-in for Delegate
- Authorization module can selectively allow access to services based on institutional participation
EZproxy
- Rewriting proxy server (EZproxy from Useful Utilities) used for vendor-hosted databases common to all consortium participants
- Custom CGI script detects OhioLINK credentials cookies and, if necessary, redirects to the authentication server
University of Texas system
TexShare
Future Developments
"Authenticate Locally, Authorize Globally" -- Project Shibboleth
"Shibboleth ... is developing architectures, policy structures, practical technologies, and an open source implementation to support inter-institutional sharing of web resources subject to access controls."
http://middleware.internet2.edu/shibboleth/
High-level Overview
http://middleware.internet2.edu/shibboleth/docs/draft-internet2-shibboleth-arch-v05.pdf
Key Project Attributes
- Federated administration
- Access control based on attributes
- Active management of privacy
- Standards based
- A framework for multiple, scaleable trust and policy sets ('Clubs')
- A standard AttributueValue vocabulary
Why Do It?
Benefits to Institutions
- Provide information services to users without cumbersome IP address restrictions and proxy servers.
- Enhance user privacy by protecting the identity while allowing authorization via non-personally identifying attributes.
- Single infrastructure for inter-institutional collaborative relationships.
Benefits to Information Providers
- Finer granularity of access control.
- Greatly simplified management of use rules and user bases.
- No need to maintain databases of authorized users.
For More Information...
- NISO Annual Meeting. Sunday, 2pm to 4pm at the Hyatt Regency Atlanta Hanover Room C/D. (NCIP, Shibboleth, OpenURL, METS)
- Shibboleth website. http://middleware.internet2.edu/shibboleth/
Wrap-up / Evaluation