Proxy Web Servers and Authentication

Program Overview

Today's Schedule

Order of topics

Facility Logistics

What are proxies?

A proxy is a service that sits between web servers (or, more accurately, "origin web servers") and clients. This service receives requests from clients and makes requests to servers on behalf of the clients.

A caching proxy, in addition to the above, saves a copy of the HTML pages, graphics, and other resources as they pass through.

Traditional reasons to use caching proxies

  1. To reduce latency...
    Latency is the delay between when a client makes a request and the entire response is received. If clients on a LAN are using the services of a caching proxy server on the LAN, the caching proxy server will always be able to respond faster to requests for objects.
  2. To reduce traffic...
    Since copies of an HTML page or a graphic are stored locally, any request for that item after the first one can be served using the proxy's store of files.

Library Scenarios

Bandwidth Conservation

  • Your access to the Internet is slow, and you've determined that it is because the line is near capacity. You know a lot of people are using web sites, and you suspect they are using similar web sites. Do you need to purchase additional bandwidth from your Internet Service Provider?
  • Your network connection to a branch site is slow and congested. Much of the access at the branch site is for resources hosted on your main network. Can you stretch the useful lifetime of that old line any further?

Statistics

  • Budget time again, and it seems like your electronic resource line is taking up more and more money. How do you know what your patrons are using and if usage patterns are worth the cost of a resource?

Filtering

  • You have a limited number of workstations but an unlimited number of people who want to play games, send e-mail, or chat. The library has formed a policy which says these activities are prohibited. Can you implement this policy?
  • You want to dedicate some of your workstations to just library catalog, database and/or e-journal access. Can that be done without hovering over patrons?

Remote Resource Access

  • You have invested all of this money in electronic resources, but database vendors offer you one of two ways to provide access: either by issuing usernames and passwords or by restricting access to a range of Internet addresses. But distributing passwords to all of your clients is awkward yet they all want access from their homes, offices, or foreign countries. Can you get them access?

Definitions

In the beginning...

User
A person making a request.
Resource
A document, graphic, file, or other object stored or generated which can be addressed with an URL.
Entity
The content transferred between clients and servers, encompassing the metadata and the resource itself.
URL
Uniform Resource Locator -- the address of an entity on the network.

Roles

Client
Software that will retrieve an entity from a network server.
Server
Software that accepts a request for an entity and return the entity to the client.
Origin Server
The server which holds the original copy of an entity.
Request
The formal process by which a client asks for an entity from a server.
Response
The formal process of a server returning an entity or information about an entity to a client.

Proxies

Proxy
A server which accepts entity requests from clients and retrieves that entity from the origin server or another proxy server.
Transparent Proxy
A proxy that passes requests and responses unmodified, except as required for proxy authentication.
Non-transparent Proxy
A proxy that somehow modifies the request or response to provide some added value to the client or user.
Cache
A collection of entities (resources and metadata) that can be used as responses to client requests.
Caching Proxy
A proxy server with a cache. Sometimes referred to as "proxy caches" or simply "caches", but the term "proxy" is often misinterpreted to include the "caching" component.

Authentication/Authorization

Authentication
"The process where a network user establishes a right to an identity." (Lynch)
Authorization
"The process of determining whether an identity ... is permitted to perform some action." (Lynch)
Access management
"Systems that may make use of both authentication and authorization services in order to control use of a networked resource." (Lynch)
Credentials
The right of an authenticated, authorized user to perform a function.
Firewall
Hardware and/or software used to protect hosts on one network segment from accesses on another.

Proxies for Bandwidth Conservation

Set up proxy server

Building Apache

+--------------------------------------------------------+
| You now have successfully built and installed the      |
| Apache 1.3 HTTP server. To verify that Apache actually |
| works correctly you now should first check the         |
| (initially created or preserved) configuration files   |
|                                                        |
|   /usr/local/apache-proxy/conf/httpd.conf
|                                                        |
| and then you should be able to immediately fire up     |
| Apache the first time by running:                      |
|                                                        |
|   /usr/local/apache-proxy/bin/apachectl start
|                                                        |
| Thanks for using Apache.       The Apache Group        |
|                                http://www.apache.org/  |
+--------------------------------------------------------+
  1. Download http://apache.oregonstate.edu/httpd/httpd-2.0.45.tar.gz
  2. tar xzf httpd-2.0.45.tar.gz
  3. cd httpd-2.0.45
  4. ./configure --with-layout=Apache --prefix=/usr/local/apache-proxy --enable-proxy --enable-proxy-http --enable-shared=max
  5. make
  6. make install

Apache httpd.conf configuration

#
# Proxy Server directives. Uncomment the following lines to
# enable the proxy server:
#
<IfModule mod_proxy.c>
    ProxyRequests On
    
    Listen 4545
    <Proxy *>
        Order deny,allow
        Deny from all
        Allow from .example.com
    </Proxy>

    #
    # Enable/disable the handling of HTTP/1.1 "Via:" headers.
    # ("Full" adds the server version; "Block" removes all outgoing Via: headers)
    # Set to one of: Off | On | Full | Block
    #
    ProxyVia On

    #
    # To enable the cache as well, edit and uncomment the following lines:
    # (no cacheing without CacheRoot)
    #
    CacheRoot "/usr/local/apache-proxy/proxy"
    CacheSize 5
    CacheGcInterval 4
    CacheMaxExpire 24
    CacheLastModifiedFactor 0.1
    CacheDefaultExpire 1
    #NoCache a_domain.com another_domain.edu joes.garage_sale.com

</IfModule>
# End of proxy directives.

Starting Apache

  1. From the root directory of the Apache installation, run: bin/apachectl start

Client configuration




Demonstration

To verify...
http://whatismyip.com/

How Proxies Work

Networking overview

Data Link (or Network Interface) Layer

Where the bits meet the wire...

Standards
Ethernet, FDDI, ATM, Wireless, Dialup, ISDN, DSL

Internet Layer

The IP (Internet Protocol) of TCP/IP

Addressing
The uniquely defined IP address for each machine.
Example
123.234.123.234

Transport Layer

The TCP (Transmission Control Protocol) of TCP/IP

Addressing
"Ports" for each service on a machine
Example
80: http services
Also found
UDP: User Datagram Protocol

Application Layer

Enough already! Let's get some work done...
Programs that ask and receive services from the network
Web clients/servers, e-mail, FTP, RealAudio, etc.

Layers on top of layers

  • Data Link Layer
  • Internet Layer
  • Transport Layer
  • Application Layer
  1. On which layer would you find switches, hubs, and repeaters?
  2. On which layer would you find routers?
  3. On which layer would you find firewalls and gateways?
  4. On which layer would you find proxies?

Overview of HTTP/1.0 and HTTP/1.1 protocols

Formats of messages

  • HTTP messages have the following structure
    1. Request/Response Line
    2. Zero or more headers
    3. A blank line
    4. Zero or one entities
  • Request lines have the form: <Method> <Address> <Protocol-Version>
  • Response lines have the form: <Protocol-Version> <Response-code> <Text-Message>
  • Header lines have the form: <Header-name>:  <Value>
  • Lines end with Carriage-Return/Line-Feed combinations.

Examples of messages

  1. GET /site-map.html HTTP/1.0
  2. Host: www.college.edu
  3.  
  1. HTTP/1.0 200 OK
  2. Date: Wed, 19 Apr 2000 16:37:29 GMT
  3. Server: Apache/1.3.12 (Unix) PHP/3.0.16
  4. Content-Type: text/html
  5.  
  6. <HTML>
  7. <HEAD>
  8. ...

Listing of HTTP Response Codes

1xx: Informational
Request received, continuing process
2xx: Success
The action was successfully received, understood, and accepted
200 OK
3xx: Redirection
Further action must be taken in order to complete the request
301 Moved Permanently
304 Not Modified
305 Use Proxy
4xx: Client Error
The request contains bad syntax or cannot be fulfilled
401 Unauthorized
403 Forbidden
404 Not Found
407 Proxy Authentication Required
5xx: Server Error
The server failed to fulfill an apparently valid request

Adapted from RFC2616

Format of Proxy Request messages

  • Similar to requests to the origin server, except that the method and host/port are included in the request
  • Specialized proxy-related headers can also be used.
  • Request lines have the form: <Method> <Address> <Protocol-Version>
  1. GET http://www.college.edu/site-map.html HTTP/1.0
  2. Host: www.college.edu
  3. Cache-control: no-cache
  4.  

Cache headers

General

Headers that can be used in either HTTP Requests or Responses.

Date
Date and time the message was created. Format of the field must be as described in RFC1123 (e.g. "Tue, 15 Nov 1994 08:12:31 GMT").
Pragma
Used to pass special directives in request and response messages. One commonly used Pragma header is "no-cache", but this is being phased out in favor of the "Cache-control" header.
Via
Indicates the chain of proxy servers used to forward the message. Proxy servers must specify the protocol/version and the hostname or pseudonym of the server.

Entity

Headers which described an resource (can be used in either HTTP Requests or Responses).

ETag
The "Entity Tag" for the resource. Entity tags are unique identifiers for a specific version of an resource, and can be used with the "If-Match" and "If-None-Match" request headers to determine when an resource changes.
Expires
Specifies the date and time the entity expires. Cached copies of an entity should not be used after this time without revalidation.
Last-modified
Specifies the last modification date and time of the resource on the origin server.

Request

Headers used only in HTTP Requests.

Host
The Internet (DNS) host name and port number from the URL of the resource being requested.
servicing the request.
If-modified-since
Used in a request to make it conditional: if the requested resource has not been modified since the time specified in this field, the resource will not be returned from the server; instead, a 304 "Not modified" response will be returned without any message-body.
If-match
In combination with the "ETag" entity header, the server will return an 412 "Precondition failed" if the ETag of the entity being requested is different from the ETag of the entity on the server
If-none-match
The inverse of the "If-match" header operation. If the ETag of the entity being requested matches the ETag of the entity on the server, the server returns a 304 "Not Modified" status.

Cache control

The "Cache-control" general header was introduced in HTTP/1.1 to consolidate and further refine the caching policies and requests in client caches and proxy caches. The "Cache-control" header defines different directives depending on whether it is used in a request or response context.

Cache control request directives

no-cache
Requests an end-to-end revalidation -- the origin server should be reached through the chain of proxies to determine whether the entity is up-to-date.
no-store
An intermediate proxy must not store any part of the request or response on non-volatile (e.g. disk) media.
max-age=seconds
The client specifies a maximum age of the entity (in seconds) that it will accept out of a proxy's cache before the origin server must be contacted.
max-stale
The client specifies that it is willing to accept an entity that the cache has determined is past its freshness lifetime.
max-stale=seconds
As above, but the client specifies a maximum number of seconds beyond a freshness lifetime.
min-fresh=seconds
The client requires that the entity in the proxy cache must have at least the specified number of seconds left before the freshness lifetime expires.
only-if-cached
Requests that the proxy return the entity only if it can be served from the proxy's cache.

Cache control response directives

public
The server specifies that the response is cacheable in any cache (client or proxy cache).
private
The response is intended for the specific client only and cannot be cached by any shared caches.
no-cache
The response is uncacheable and must not be stored in either client or proxy caches.
no-store
The response cannot be stored on any non-volatile media. This usually means that the entity can only be stored in memory and never to disk, where it is susceptible to compromise.
no-transform
Intermediate proxy servers must not perform any transformations on the entity.
max-age=seconds
The origin server specifies a freshness lifetime for the entity, overriding lifetime values determined by the proxy caches.

Demonstrations

A look at the headers and cacheability parameters of three sites: www.ala.org, www.whitehouse.gov, and www.cnn.com.

Display HTTP Headers

View of HTTP headers using the services of http://www.web-caching.com/showheaders.html.

Cacheability Query

Overview of cacheability of HTML and referenced resources using the services of http://www.ircache.net/cgi-bin/cacheability.py.

Local PC Setup

Manual Configuration

Proxy Auto-Configuration Files

Format

  • A piece of JavaScript code which takes the form:
            function FindProxyForURL(url, host)
            {
                ...
            }
  • This file is stored on a web server with the extension .pac
  • The web server must be configured to map the .pac extension to the application/x-ns-proxy-autoconfig MIME type
    • For the Apache web server, add an AddType directive: AddType application/x-ns-proxy-autoconfig .pac

Programming

  • The FindProxyForURL function receives two parameters for each URL requested: "url" and "host"
    url
    The complete URL being requested.
    host
    The hostname extracted from the URL. This is the exact same host listed in the URL. The port number is not included (it can be extracted from the URL if needed).
  • The FindProxyForURL function must return a string in one of three formats:
    DIRECT
    The request should go directly to the origin server.
    PROXY host:port
    The request should go to the specified proxy.
    SOCKS host:port
    The specified SOCKS server should be used.
  • More than one method may be used; separate different methods by semicolons

Helpful Pre-defined Functions

isPlainHostName(host)
True if and only if there is no domain name in the hostname (no dots).
host
the hostname from the URL (excluding port number).
shExpMatch(str, shexp)
Returns true if the string matches the specified shell expression.
str
is any string to compare (e.g. the URL, or the hostname).
shexp
is a shell expression to compare against.

Adapted from Navigator Proxy Auto-Configure File Format

Example -- All clients through one proxy server

  function FindProxyForURL(url,host) {
   // If the host requested on the URL line is not a FQDN
   // (eg, it is 'www'), then don't proxy.
    if (isPlainHostName(host)) {
      return "DIRECT";
    }
   
   // Otherwise, send through proxy
    return "PROXY proxy.college.edu:4545";
    }
  }

Example -- Some clients through one proxy server for all requests

  function FindProxyForURL(url,host) {
   // If the host requested on the URL line is not a FQDN
   // (eg, it is 'www'), then don't proxy.
    if (isPlainHostName(host)) {
      return "DIRECT";
    }
   
   // Make OPAC stations go through Proxy Server
    if (myIpAddress() == "10.243.20.242" ||
      myIpAddress() == "10.243.21.210" ||
      myIpAddress() == "10.243.21.241" ||
      myIpAddress() == "10.243.22.13" ||
      myIpAddress() == "10.243.22.19" ||
      myIpAddress() == "10.243.22.35" ||
      myIpAddress() == "10.243.22.41" ||
      myIpAddress() == "10.243.22.182") { 
        return "PROXY proxy.college.edu:4545";
    }
     
   // Everyone else can go directly to the origin server
    return "DIRECT";
  }

Example -- All clients through two proxy servers for some requests

  function FindProxyForURL(url,host) {
   // If the host requested on the URL line is not a FQDN (eg, it is 'www'),
   // then don't proxy.
    if (isPlainHostName(host)) {
      return "DIRECT";
    }
   
   // Now do the list of IP-restricted services; they go through the proxy
    if (shExpMatch(host, "*eb.com") || shExpMatch(host, "*oclc.org")) {
      return "PROXY proxy1.college.edu:4545; PROXY proxy2.college.edu:4545";
    }
    
   // Otherwise, go directly to the origin server
    return "DIRECT";
  }

Automatic Detection

Proxies for Statistics

Set up proxy server

Statistics programs

Demonstration

4.54.39.182 - - [01/Aug/2000:00:17:10 -0500] "GET http://rave.ohiolink.edu/databases/login/abig HTTP/1.0" 302 296
4.54.39.182 - - [01/Aug/2000:00:17:11 -0500] "GET http://olc7.ohiolink.edu/cgi-bin/login/abig HTTP/1.0" 302 257
4.54.39.182 - - [01/Aug/2000:00:17:13 -0500] "GET http://olc7.ohiolink.edu/bin/gate.exe?f=login&p_lang=english&p_d=abig HTTP/1.0" 302 247
4.54.39.182 - - [01/Aug/2000:00:17:19 -0500] "GET http://olc7.ohiolink.edu/bin/gate.exe?f=search&state=3v0758.1.1 HTTP/1.0" 200 7164
4.54.39.182 - - [01/Aug/2000:00:17:20 -0500] "GET http://olc7.ohiolink.edu/style/dw.css HTTP/1.0" 304 -
4.54.39.182 - - [01/Aug/2000:00:17:40 -0500] "GET http://olc7.ohiolink.edu/bin/gate.exe?f=brwsidx&state=3v0758.1.1&p_IdxPara=JNBR&p_IdxTerm= HTTP/1.0" 200 2349
4.54.39.182 - - [01/Aug/2000:00:17:47 -0500] "GET http://olc7.ohiolink.edu/cgi-bin/submit-brws?f=brwsidx&state=3v0758.2.1&p_L=8&p_IdxTerm=The+Economist HTTP/1.0" 302 286
4.54.39.182 - - [01/Aug/2000:00:18:41 -0500] "GET http://olc7.ohiolink.edu/bin/gate.exe?f=brwsidx&state=3v0758.2.1&p_L=8&p_IdxTerm=The-Economist HTTP/1.0" 200 5323
4.54.39.182 - - [01/Aug/2000:00:19:14 -0500] "GET http://olc7.ohiolink.edu/bin/gate.exe?f=logout&state=3v0758.3.1&p_goto=cdb HTTP/1.0" 302 224

Privacy Statements

What is collected?

4.54.39.182 - - [01/Aug/2000:00:17:19 -0500] "GET http://olc7.ohiolink.edu/bin/gate.exe?f=search&state=3v0758.1.1 HTTP/1.0" 200 7164

Portions adapted from Apache Module mod_log_common documentation.

Sensitivity of information

Forming a privacy statement

Disposition of Log Files

  • Adapted Trust*E section on log files:
    Log Files
    We use IP addresses to analyze trends, administer the site, and gather broad demographic information for aggregate use. IP addresses are not linked to personally identifiable information. Your login ID/barcode may be linked to specific log entries, but this identification is removed before statistics are generated.

Use of logs to identify problems and service abuses

Proxies for Filtering

What this section is...
...a discussion of technology solutions to fulfill policy requests
What this section is not...
...a debate about the pros and cons of Internet filtering
...a review of strictly content-based filters for library workstations

Fake a proxy server to prevent access to all but authorized sites

1. Invalid Proxy Server

  • Put in an invalid proxy server and then "exclude" the lists of sites for which you want to allow access. (Andrew Mutch of The Library Network [Michigan])
  • Users receive a browser message that the host name "This host is limited to..." could not be found

2. Fake Proxy Server

  • #1 is good, but the error message is vague and confusing.
  • Run a fake proxy server on a specific port on a UNIX box which simply displays an HTML page.
    1. Create a HTTP-response-in-a-file (/usr/local/sorry.cat-html in this example):
      HTTP/1.0 200 Ok
      Content-type: text/html
      
      <HTML>
      <HEAD><TITLE>Can't go there</TITLE></HEAD>
      <BODY><P>Sorry -- you can't get there from this workstation.</P></BODY>
      </HTML>
      
    2. Add a line to your services file: fakeproxy 8080/tcp
    3. Add a line to your inetd.conf file: fakeproxy stream tcp nowait httpusr /bin/cat cat /usr/local/sorry.cat-html ...and restart your inetd server with a HUP signal.

3. Use a PAC file

  • #2 is better, but whenever you add or remove services you have to visit every machine
  • Create a Proxy Auto-Config file which only allows access to particular sites
      if (shExpMatch(host, "*eb.com") || shExpMatch(host, "*oclc.org")) {
          return "DIRECT";
        }
        
        return "PROXY www.institution.org:8080";

Gaming / Web-based E-mail, Chat / etc.

Setting up Apache to block requests to certain sites

ProxyBlock
Syntax
ProxyBlock <word/host/domain list>
Compatibility
ProxyBlock is only available in Apache 1.2 and later.

The ProxyBlock directive specifies a list of words, hosts and/or domains, separated by spaces. HTTP, HTTPS, and FTP document requests to sites whose names contain matched words, hosts or domains are blocked by the proxy server. The proxy module will also attempt to determine IP addresses of list items which may be hostnames during startup, and cache them for match test as well. Example:

  ProxyBlock joes-garage.com some-host.co.uk rocky.wotsamattau.edu

'rocky.wotsamattau.edu' would also be matched if referenced by IP address.

Adapted from Apache Module mod_proxy documentation.

Other forms of filtering

Filtering based on Content-type headers; browser string headers
Prevent certain file types from being downloaded or stop specified web browsers from functioning.
Removing Cookie headers, ad graphics
Block personalization functions and web advertising. Protect privacy by removing identifying information from requests.
Virus protection on web requests
Scan incoming files for viruses to prevent them from being downloaded to local machines.
Character code translation
Translate web pages on the fly into different character sets.

Authentication systems

How web clients authenticate to servers

  1. The client makes a normal request for a page. The server determines that authentication is required for that page.
  2. The server returns a WWW-Authenticate header, and the browser displays a login box with the realm string supplied by the server.
        WWW-Authenticate: Basic realm="WallyWorld"
  3. The browser accepts the login and password from the user, creates a string in the form "<login>:<password>", encodes it with Base-64, and sends that in an Authorization header back to the server with the same URL request.
        Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==
  4. The server decodes the Base-64 string, separates the login and password, and checks the credentials.

Sources for authentication

What can Apache handle?

Search the Apache Module Registry for "authentication":
http://modules.apache.org/search?search=Authentication&query=true

  • NT Domain
  • LDAP
  • Kerberos
  • Radius
  • TACACS+
  • Various databases
  • "External Authentication"

Proxies for Remote Resource Access

Basic Theory

Is this a "reverse proxy"?

No. The term "Reverse Proxy" is used to describe the functions of a proxy server that sits between the Internet and the origin server.

Authentication step

Transparent versus non-transparent proxy servers

Transparent Proxy
A proxy that passes requests and responses unmodified, except as required for proxy authentication.
Non-transparent Proxy
A proxy that somehow modifies the request or response to provide some added value to the client or user.
Rewriting Proxies
A special form of the Non-transparent Proxy server which examines the URLs in HTML documents passing through the proxy, and rewrites them to point back to the proxy server
"http://firstsearch.oclc.org/dbname=WorldCat;graphics=low;FSIP" becomes "http://proxy.college.edu/firstsearch/dbname=WorldCat;graphics=low;FSIP" or "http://proxy.college.edu:2049/dbname=WorldCat;graphics=low;FSIP" or "http://80-firstsearch.oclc.org.proxy.college.edu/dbname=WorldCat;graphics=low;FSIP"

Advantages, Disadvantages

Transparent Proxy servers...
...are less computing intensive because they do not examine the content of each HTML page.
...are easier to program than Rewriting Proxies.
...require users to reconfigure their browsers (education problem).
...may not work with some corporate or commercial Internet Service Providers.
Rewriting Proxy servers...
...require no changes to the user's browser and work with browsers on firewalled networks.
...sensitive to "incorrect" HTML.
...may not work with sites using sophisticated JavaScripts.

EZproxy Demonstration

  1. mkdir /usr/local/ezprozy
  2. cd /usr/local/ezproxy
  3. Download http://www.usefulutilities.com/ezproxy/ezproxy.bin
  4. mv ezproxy.bin ezprozy
  5. chmod 755 ezproxy
  1. ./ezproxy -m
  2. ./ezproxy -c
  3. ./ezproxy
  4. Point web browser to http://proxy.college.edu:2048/
    Login/Password: testuser/testpass

What's happening here?

URLs are being rewritten on the page to point back through the EZproxy server.

http://www.altavista.com/ maps to
   http://concerto.law.uconn.edu:2050/

http://doc.altavista.com/help/search/search_help.shtml maps to
   http://concerto.law.uconn.edu:2052/help/search/search_help.shtml

http://shopping.altavista.com/ maps to
   http://concerto.law.uconn.edu:2054/

http://tools.altavista.com/ maps to
   http://concerto.law.uconn.edu:2055/

Mappings are stored in the "ezproxy.hst" file.

EZproxy's new scheme

Last year, a version of EZproxy was released with a new scheme for rewriting URLs. Using wildcard DNS entries and the "Host:" header, URLs can be rewritten as such:

http://www.altavista.com/ maps to
   http://80-www.altavista.com.ezproxy.law.uconn.edu/

http://doc.altavista.com/help/search/search_help.shtml maps to
   http://80-doc.altavista.com.ezproxy.law.uconn.edu/help/search/search_help.shtml

How is this better?

  • Eliminates non-standard ports; allows EZproxy services to be used through restrictive corporate firewalls
  • Reduces resource requirements for EZproxy server (fewer ports)

What do you give up?

  • Can no longer run EZproxy and a standard web server on the same machine

Adding a new site

  1. Edit ezproxy.cfg to add:
      T Database Title
      U http://url.to.database/search/
      D domains.used.by.vendor.com
      D another.domain.com
  2. Construct the URL on your web page http://proxy.college.edu:2048/login?url=http://somedb.com/search
  T LegalTRAC from Gale
  U http://infotrac.galegroup.com/itweb/nellco_main
  D galegroup.com

Adding authentication

By barcode pattern
Add line to ezproxy.usr:
28888#######::
By text file
Add line to ezproxy.usr:
::file=myusers.txt
By IMAP server login
Add line to ezproxy.usr:
::imap=imapserver.college.edu

Alternatives to Proxy Servers for remote resource access

Interception Proxies

Why do we care?

The good...
Requires no changes to client browser or the construction of special URLs.
...the bad...
According to network purists, a interception proxy violates the fundamental concept of the "invisible" network.
...and the ugly.
The installation of interception proxies breaks IP address recognition for access to remote databases.

How do they work?

Back to the discussion of network layers:

Application Layer  
Transport Layer Port
Internet Layer IP address
Data Link (or Network Interface) Layer Ethernet Address

Remember where proxy servers were located? Interception Proxies operate at a different location!

Follow the network path: normal web transaction

  1. Requests leaves client machine destined towards origin server.
  2. Network routers and switches move the transaction to the origin server.
  3. Received by the origin server: the IP address of the request is that of the client machine.

Follow the network path: proxy server transaction

  1. Requests leaves client machine destined towards proxy server (as directed by the browser configuration).
  2. Network routers and switches move the transaction to the proxy server.
  3. Received by the proxy server; request leaves proxy destined towards the origin server.
  4. Network routers and switches move the transaction to the proxy server.
  5. Received by the origin server: the IP address of the request is that of the proxy server.

Follow the network path: normal web transaction with an interception proxy

  1. Requests leaves client machine destined towards origin server (no changes to the browser configuration).
  2. Network routers and switches move the transaction to the origin server, but one of the routers detects that the request is an HTTP transaction. Using a proprietary protocol, passes the request to the interception proxy.
  3. Received by the interception proxy server; request leaves proxy destined towards the origin server.
  4. Network routers and switches move the transaction to the proxy server.
  5. Received by the origin server: the IP address of the request is that of the interception proxy server.

What to do?

  1. Ask your network staff if they are intending to install an interception proxy. Remind them of the effect of installing an interception proxy.
  2. Ask your network staff to ask your ISP if they have intentions of installing an interception proxy.
  3. Prepare a list of IP addresses for services which will need to be "excluded" from the interception proxy function.

Free Proxy Servers

For each package, we'll look at:

Availability
Proxy Type
Platforms
Pricing
Proxy Characteristics
Bandwidth Conservation
Statistics
Filtering
Remote Resource Access
Comments

Apache

"This module implements a proxy/cache for Apache. It implements proxying capability for FTP, CONNECT (for SSL), HTTP/0.9, and HTTP/1.0. The module can be configured to connect to other proxy modules for these and other protocols."
http://www.apache.org/docs/mod/mod_proxy.html

Availability

Proxy Type
Transparent Proxy, Non-transparent Proxy, Rewriting Proxy (with code development)
Platforms
Available pre-compiled for a wide variety of UNIX, Windows, Macintosh, and other operating systems
Pricing
Freely available -- no usage restrictions. Commercial support available.

Proxy Characteristics

Bandwidth Conservation
Yes.
Statistics
Yes.
Filtering
Yes, but limited to hosts/ip-addresses. (See ProxyBlock in documentation.)
Apache's mod_proxy module is extendable with mod_perl to modify the outgoing request (for example, stripping off headers in order to create an anonymizing proxy) or to modify the returned page.
Remote Resource Access
Yes. Authentication via flat-file and UNIX database files.
Authorization extendable using Apache APIs.

Comments

  • May require knowledge of the chosen server platform in order to configure and support.
  • Best used if your site is already running the Apache web server.

Squid

"Squid is a high-performance proxy caching server for web clients, supporting FTP, gopher, and HTTP data objects."
http://www.squid-cache.org/Doc/FAQ/FAQ-1.html#ss1.1

Availability

Proxy Type
Transparent Proxy, Non-transparent Proxy, Rewriting Proxy (with code development)
Platforms
Available for a wide variety of UNIX platforms. Must be compiled.
Also available precompiled for Windows NT 4.0 and Windows 2000/XP/2003
Pricing
Freely available (GNU General Public License). Commercial support available.

Proxy Characteristics

Bandwidth Conservation
Yes.
Statistics
Yes. More than you could ever want.
Filtering
Yes. Very flexible.
Remote Resource Access
Yes.

Comments

  • Very powerful and complex software. It may be over-kill for most situations.
  • Requires knowledge of UNIX environment and the process of compiling and installing programs under UNIX.

Libproxy

"Libproxy is a simple rewriting pass-through proxy system designed especially for libraries."
From the README file

Availability

Proxy Type
Rewriting Proxy
Platforms
UNIX. Based on Apache, Perl, Mod_Perl, and other open source tools.
Pricing
Free.

Proxy Characteristics

Bandwidth Conservation
No, but can be connected to a caching proxy server.
Statistics
Yes.
Filtering
No, but can be connected to a proxy server with filters.
Remote Resource Access
Yes.

Comments

  • Used at Brown University and elsewhere.
  • Full source included; can be modified to fit the local environment.
  • Requires knowledge of UNIX environment and the process of compiling and installing programs under UNIX. Knowledge of Perl and Apache will be helpful.

Delegate

"DeleGate is a multi-purpose application level gateway, or a proxy server which runs on multiple platforms. DeleGate mediates communication of various protocols, applying cache and conversion for mediated data, controlling access from clients and routing toward servers. It translates protocols between clients and servers, merging several servers into a single server view with aliasing and filtering."
http://wall.etl.go.jp/delegate/

Availability

Proxy Type
Transparent Proxy, Non-transparent proxy
Platforms
Unix, Windows and OS/2
Pricing
Freely available -- no usage restrictions.

Proxy Characteristics

Bandwidth Conservation
Yes.
Statistics
Yes.
Filtering
Yes. Includes a specialized (proprietary) programmable language to filter requests, responses, and entities.
Remote Resource Access
Unclear. Includes a feature called "Proxy by URL Redirection" which may form the basis of a rewriting proxy.

Comments

  • Much more than a web proxy. Also proxies "FTP, Telnet, NNTP, SMTP, POP, IMAP, LPR, LDAP, ICP, DNS, SSL, Socks, and more."
  • Active development (new version released this week) and active user mailing list.
  • Documentation lags behind released version.

Commercial Proxy Servers

Microsoft Internet Security and Acceleration (ISA) Server

"The enterprise firewall and Web cache server."
http://www.microsoft.com/isaserver/

Availability

Proxy Type
Transparent Proxy, Non-transparent Proxy, Rewriting Proxy (with code development)
Platforms
Windows 2000
Pricing
US $1,499

Proxy Characteristics

Bandwidth Conservation
Yes.
Statistics
Yes.
Filtering
Supports filtering by domain names
Supports 3rd-party plug-ins with categorized lists of sites to be blocked
Remote Resource Access
No.

Comments

  • Formerly the Microsoft Proxy Server
  • More than a proxy server:
    ISA Server includes an extensible, multilayer enterprise firewall featuring security with packet-, circuit-, and application-level traffic screening, stateful inspection, broad application support, integrated virtual private networking (VPN), system hardening, integrated intrusion detection, smart application filters, transparency for all clients, advanced authentication, secure server publishing, and more.

iPlanet Proxy Server

"Acting as a network traffic manager, it reduces the number of requests to remote content servers and lessens network traffic. The result is that user wait times are lowered and network performance is boosted. The Sun ONE Web Proxy Server also provides a secure gateway for content distribution and acts as a control point for Internet traffic, making communications managed by the product not only efficient, but secure."
http://wwws.sun.com/software/products/web_proxy/home_web_proxy.html

Availability

Proxy Type
Transparent Proxy, Non-transparent Proxy, Rewriting Proxy (with code development)
Platforms
HP-UX, AIX, Solaris, Windows NT, Windows 2000 Server, Windows 2000 AS
Pricing
Unknown

Proxy Characteristics

Bandwidth Conservation
Yes.
Statistics
Yes.
Filtering
Yes (URLs, content, content types, and outgoing header filters)
Supports 3rd-party plug-ins with categorized lists of sites to be blocked.
Remote Resource Access
Yes. Supports LDAP-based authentication.

WinProxy

"WinProxy provides everything you need to simultaneously connect all your computers to the Internet through just one simple connection with your existing service provider."
http://www.winproxy.com/

Availability

Proxy Type
Transparent Proxy, plus Network Address Translation (NAT)
Platforms
Windows 95/98, or NT (3.51 or higher)
Pricing
$799.95 (unlimited user)

Proxy Characteristics

Bandwidth Conservation
Yes.
Statistics
Yes.
Filtering
Yes. Either "Site blacklisting" or "Site whitelisting"
Supports 3rd-party plug-ins with categorized lists of sites to be blocked.
Remote Resource Access
No.

Comments

  • Much more than a web proxy server. Also acts as a firewall and a gateway machine for an internal LAN.

EZproxy

"EZproxy provides the easiest way for libraries to extend web-based licensed databases to their remote users."
http://www.usefulutilities.com/ezproxy

Availability

Proxy Type
Rewriting Proxy
Platforms
Linux, Windows NT
Custom compiling for other UNIX platforms available
Pricing
US $495 per server (plus sales tax in Arizona)

Proxy Characteristics

Bandwidth Conservation
No. No caching built in. EZproxy can be chained to another proxy server.
Statistics
Yes. Statistics stored in Common Logfile Format without usernames
Filtering
No.
Remote Resource Access
Yes. Authentication via IP address, text file of usernames/passwords, FTP/IMAP/POP login, or an extensible API based on HTTP requests. Latest version also includes authentication by LDAP, Radius, and INNOPAC Patron-API as well as the ability to limit database access to specific groups of users.

Obvia

"Remote Database Access (RDA) Service: The Complete Solution for all your Remote Authentication Needs"
http://www.obvia.com/

Availability

Proxy Type
Rewriting Proxy
Platforms
Obvia-hosted, Windows/NT
Pricing
Varies

Proxy Characteristics

Bandwidth Conservation
No.
Statistics
Yes. Comprehensive usage statistics module.
Filtering
No.
Remote Resource Access
Yes. Authentication via flat-file, ILS (DRA or INNOPAC), Kerberos, LDAP, POP3, Netware, Windows NT, or custom interface.

Comments

  • Offers turn-key service from one of their data centers (the ultimate in bandwidth-saving!).

Remote Patron Authentication from Dynix

"As libraries increasingly use Web-based content subscription services, they need to authenticate remote patrons who use the Web to access online resources. Remote Patron Authentication (RPA) from epixtech enables libraries to authenticate patrons outside a library facility before providing them access to restricted resources."
http://www.dynix.com/products/pac/index.asp

Availability

Proxy Type
Not a proxy
Platforms
"Core application: Web server with CGI 1.1 support"
"Reporting component: Intel-based Windows NT server, ODBC capable SQL database management system"
Pricing
Unknown

Proxy Characteristics

Bandwidth Conservation
No.
Statistics
Yes (summarized).
Filtering
No.
Remote Resource Access
Yes. Authentication via 3M SIP1/SIP2 or ILS patron authentication.

Comments

  • Client interacts directly with database vendor after authentication
  • Provides access to vendor databases via three authentication methods:
    1. Referring URL
    2. URL-Embedded Username and Password
    3. Database Vendor provided Script
  • Clients view the database list in a framed or non-framed environment. JavaScript is required.

Web Access Management (WAM) from Innovative Interfaces

Availability

Proxy Type
Rewriting Proxy, Transparent Proxy
Platforms
Requires INNOPAC ILS software
Pricing
Varies

Proxy Characteristics

Bandwidth Conservation
No.
Statistics
Yes (summarized).
Filtering
No.
Remote Resource Access
Yes. Authentication by INNOPAC Patron Validation.

Comments

  • Product started as a rewriting proxy server.
  • Added a non-rewriting proxy server with a PAC file.
  • Now it is back to a rewriting proxy server.

Resources

Presentation web site
http://www.PandC.org/proxy/

Lists of Links

Authentication and Authorization list of links
http://library.smc.edu/rpa.htm
Access Log Analyzers
http://www.uu.se/Software/Analyzers/Access-analyzers.html
Apache Authentication Modules
http://modules.apache.org/search?search=Authentication&query=true
Google's List of Proxy Resources
http://directory.google.com/Top/Computers/Software/Internet/Servers/Proxy/
Yahoo List of Proxy Resources
http://dir.yahoo.com/Computers_and_Internet/Software/Internet/World_Wide_Web/Servers/Proxies/

Specific Documents

"Library Web Proxy Use Survey Results." Information Technology and Libraries 20, no. 4 (2001): 172-178.
http://www.ala.org/Content/NavigationMenu/LITA/LITA_Publications4/ITAL__Information_Technology_and_Libraries/Volume_20,_No__4,_December_2001.htm#anchor167252
Hypertext Transfer Protocol -- HTTP/1.1(RFC 2616)
http://www.rfc-editor.org/rfc/rfc2616.txt
Internet Web Replication and Caching Taxonomy
http://www.rfc-editor.org/rfc/rfc3040.txt
Caching Tutorial for Web Authors and Webmasters
http://www.wdvl.com/Internet/Cache/
Pass-Through Proxying as a Solution to the Off-Campus Web-Access Problem
http://www.goerwitz.com/software/libproxy/docs/
Navigator Proxy Auto-Configure File Format
http://wp.netscape.com/eng/mozilla/2.0/relnotes/demo/proxy-live.html
Blacklist of sites that provide email, chat, and game-playing
http://www.riverofdata.com/tools/blacklist.htm
How to Lock-In IP Addresses on Netscape Navigator
http://northville.lib.mi.us/tech/lockin.htm
How to Lock-In Web Addresses on Internet Explorer 5
http://tech.tln.lib.mi.us/lockinie.htm
The Web Proxy Auto-Discovery Protocol
http://www.web-cache.com/Writings/Internet-Drafts/draft-ietf-wrec-wpad-01.txt Expired Internet-Draft
Luotonen, Ari, Web Proxy Servers (Prentice Hall, 1998).
http://www.amazon.com/exec/obidos/ASIN/0136806120/
CNI White Paper on Authentication and Access Management Issues in Cross-organizational Use of Networked Information Resources
http://www.cni.org/projects/authentication/authentication-wp.html
Trust*E Privacy Resource Guide
http://www.truste.org/bus/pub_resourceguide.html

Proxy-related tools

Display HTTP Headers
http://www.web-caching.com/showheaders.html
Cacheability Query
http://www.ircache.net/cgi-bin/cacheability.py

User Documentation

Proxy Auto-Configuration setup (selected sites)
Central Michigan University: http://ocls.cmich.edu/remoteindex.htm
Northwestern University: http://www.library.northwestern.edu/help/proxy/
Purdue University: http://www.lib.purdue.edu/info/offcamp/

Free Proxy Servers

Apache
http://httpd.apache.org/
Squid
http://www.squid-cache.org/
LibProxy
http://www.goerwitz.com/software/libproxy/dist/
DeleGate
http://www.delegate.org/delegate/

Commercial Proxy Servers

Microsoft ISA Server
http://www.microsoft.com/isaserver/
Sun ONE Web Proxy Server
http://wwws.sun.com/software/products/web_proxy/home_web_proxy.html
WinProxy
http://www.winproxy.com/
EZproxy from Useful Utilities
http://www.usefulutilities.com/ezproxy/
Remote Database Access from Obvia
http://www.obvia.com/
Remote Patron Authentication from epixtech
http://www.dynix.com/products/pac/index.asp
Web Access Management (WAM) from Innovative Interfaces
http://www.iii.com/products/millennium/digitalcollections.shtml#wam

Future Developments

Project Shibboleth

Shibboleth, a project of Internet2's Middleware Architecture Committee for Education, is investigating technology to support inter-institutional authentication and authorization for access to Web pages. The intent is to support, as much as possible, the heterogeneous security systems in use on campuses today, rather than mandating use of particular schemes like Kerberos or X.509-based PKI.
Adapted from http://middleware.internet2.edu/shibboleth/
Shibboleth \Shib"bo*leth\, n. [Heb. shibb[=o]leth an ear of corn, or a stream, a flood.]
  1. A word which was made the criterion by which to distinguish the Ephraimites from the Gileadites. The Ephraimites, not being able to pronounce sh, called the word sibboleth. See --Judges xii.
  2. Hence, the criterion, test, or watchword of a party; a party cry or pet phrase.
Adapted from http://middleware.internet2.edu/shibboleth/why-shibboleth.html and Webster's Revised Unabridged Dictionary (1913).

Project Goals

  • "...a standards based vendor independent web access control infrastructure that can operate across institutional boundaries."
  • "This project seeks to define [...] standards for the secure exchange of trusted interoperable information which could be used in authorization decisions."
  • "The goal is is to develop and promulgate an architecture, which can then be used in a multi-vendor, open source, standards based environment."

What it means for us?

  • Information providers and network infrastructure groups come to an agreement (a protocol) on how to exchange authentication, authorization, and demographic information.
  • The user is in control over how much information is released about himself or herself on a provider-by-provider basis.
  • Eliminates the use of proxy servers for remote resource access.
  • Enhance statistics to include what demographic group is using what resources.

What it the status of the project?

  • Working group of Internet2 is designing the protocol
  • A call for participants was released a year ago and the "Club Shib" participants selected

Wrap-up / Evaluation