Library Proxy Use Survey Results

  Table 1. Breakdown of Respondents
to survey.
 
Library Type Count
Academic 60
Public 8
Other/None 6

Between November 16 and December 1, 2000, 74 libraries responded to a call for participation in a survey of library use of proxy web servers. A breakdown of library types is shown in Table 1. Three of the responding libraries are not using proxy servers at this time.

This report was last updated on 11 Dec 2000.


Proxy for Remote Resource Access

  Table 2. Software Packages used for
Remote Resource Access.
 
Software Count
EZproxy 26
Web Access Management 12
Squid 7
Apache 6
Microsoft Proxy 3
Netscape Proxy 3
Delegate 1
Home-grown 1
Netware BorderManager 1
Combinations 2

By far the most frequent reason for libraries to use web proxy servers is to enable off-network users to access vendor-provided resources. These resources are typically restricted to an institution's network addresses, and placing a proxy server within that range of network addresses enables off-network users to appear to come from within the network to database vendors.

Sixty-two libraries responded that they use proxies for remote resource access; the breakdown of software packages is shown in Table 2. EZproxy is the most popular, followed by Innovative Interfaces' proprietary Web Access Management. One library uses a combination of Apache and EZproxy and another uses Microsoft Proxy and Netscape Proxy. In addition, a public library had been using Web Access Management but are planning to install EZproxy; no proxy server is in place at this time.

Some proxy servers for remote resource access require users to configure their browsers to take advantage of the proxy service. Several libraries supplied URLs to documentation explaining this reconfiguration; these were the best in the author's opinion:

Central Michigan University
http://ocls.cmich.edu/remoteindex.htm
University of Waterloo
http://www.tug.uwaterloo.ca/proxy/
Tarleton State University
http://www.tarleton.edu/%7Elibrary/proxy/instructions1.htm

Several academic libraries have created their own "rewriting" proxy servers, often using existing free proxy servers as the basis. Rewriting proxy servers transform the HTML pages from vendor databases such that URLs on the page are rewritten to point back to the proxy server. One example is the University of Calgary, submitted by Eric Tull. EZproxy is an example of a commercial rewriting proxy server.

One academic library makes its authenticating proxy server available to other campus departments besides the library, but at this time only the main library and the law school library are using the service. The same library is considering expanding the proxy server use beyond authentication to bandwidth conservation.

Another academic library has set up its EZproxy server to allow access to IP-restricted resources on campus web server in addition to vendor provided databases. The types of resources made available in this fashion are antivirus software, campus network maps, and campus faculty committee documents. The systems librarian and campus webmaster seek out other campus web information to make available using this mechanism.


Proxy for Filtering

  Table 3. Software packages used for
Filtering.
 
Software Count
Microsoft Proxy 4
WinProxy 3
Squid 2
Apache 1
Bess 1
Netscape Proxy 1
Novell BorderManager 1
WebManager 1
Other -- More than one 2

Sixteen libraries use web proxies for filtering Internet stations. Software packages used are shown in Table 3. Proxy servers can examine the HTTP request from the client or response from the server, and modify the request before delivering it to the server or modify the response before returning it to the client.

Most libraries use filtering proxies for allow lists (access to only specific web sites) and deny lists (block access to specified web sites). Six use proxies for deny lists, three for allow lists, and seven use proxies in their libraries for both reasons.

Dan Lester from Boise State University included in his survey response details about how his library uses WinProxy to deny access to web-based e-mail, gaming, and chat sites. The Boise State University setup is discussed at http://www.riverofdata.com/tools/blocking.htm. In addition, Lester edits a list of known web sites with web-based e-mail, gaming, and chat functions; libraries are encouraged to submit additions and corrections to the list. More details, including the list, can be found at http://www.riverofdata.com/tools/blacklist.htm.

A number of libraries are using plug-ins to Microsoft's Proxy server to do various forms of filtering. One public library is using the CyberPatrol plug-in to filter content on library stations. Another library is using a plug-in called "Websense" to provide optional filtering of sexual materials for our patrons. A multi-type library consortium is using the SmartFilter plug-in for Microsoft Proxy Server.

A public library uses a proxy to filter advertisements out of responses sent back from servers.

The University of Waterloo Library forces public library stations to use a proxy server. A router between the public library network and the campus network restricts http requests to just the proxy server (in addition to other network restrictions). Stations must therefore use the proxy server to access web resources. The proxy server includes allow/deny directives denying access to web based email services.


Proxy for Bandwidth Conservation

  Table 4. Software packages used for
Bandwidth Conservation.
 
Software Count
Microsoft Proxy 6
Netscape Proxy 2
Squid 2
CacheFlow 5000 1
Cisco cache engine 1
Cobalt 1
Novell BorderManager 1
Novell Internet Router 1
WinProxy 1
Other -- More than one 1

Seventeen libraries use proxy servers for the traditional reason to install proxy servers -- bandwidth conservation; the proxy servers used by libraries are listed in table 4. A caching proxy server stores web requests and responses; the cache is used to respond faster to subsequent requests without transversing the Internet connection.

In a related response, one library reported that it employs a proxy server to reduce the load on an old, proprietary web server that cannot be replaced for several months. Because requests come through the proxy server first, requests for static content such as graphics and HTML files which don't regularly change can be handled by the proxy server rather than the old web server.

A number of the responses to this question are not traditional software proxies, but rather interception proxies. An interception proxy requires no changes to web clients; it operates instead at a network infrastructure level. Network routers and switches redirect HTTP requests to the interception proxy transparently where the proxy returns the response out of its cache or contacts the web server for the response on behalf of the client. Of the responses to this question, CacheFlow 5000, Cisco cache engine, Cobalt, Novell BorderManager, and Novell Internet Router are interception proxies. (Cobalt and Novell Internet Router can also be a non-interception, traditional proxy.)


Proxy for Gathering Statistics

  Table 5. Software packages used for
Statistics.
 
Software Count
Microsoft Proxy 4
EZproxy 2
Netscape Proxy 1
Novell Internet Router 1
Squid 1
WinProxy 1
Other -- More than one 1

Eleven libraries use proxy servers to gather statistics on web requests; the proxy servers used are listed in table 5. By configuring OPAC stations to use a proxy server for requests to vendor databases, a library can get a rough gauge of database usage by examining the log files of the proxy server.

In the survey, libraries were asked to identify what applications they used to create statistical reports from the proxy server logs. The applications listed were (each application was mentioned once unless otherwise noted):

One academic library uses a home-grown counter on links to databases; tracking the number of times the link is accessed gives the library an idea of how often databases are used.

Another academic library periodically uses Squid for in-house activity views. Only a limited, random number of sessions are examined.

One multi-type library consortium does not collect statistics. The data that cannot be disabled is automatically deleted by the system in order to protect users' privacy.


Other results

Respondents were asked if the library publishes a privacy policy regarding the use of proxy server log files. One library includes a statement regarding data collection and use on the proxy server login page:

"This information is collected under the Freedom of Information and Protection of Privacy Act. It is required to verify the identification of the researcher and to authorize access to the database. If you have any questions about the collection or use of this information, please contact Public Services Systems Librarian..."

Boise State University used proxy logs in the arrest of a patron who was viewing child pornography. Four days of logs were given to the law enforcement personnel. There was no issue of needing a court order to get the data as the library filed the complaint.

An academic library installing a Virtual Private Network (VPN) for off-campus clients on DSL and Cable Modem connections to access resources restricted by IP address. VPNs extend the institution's IP addresses to machines outside the local area network by tunneling network traffic through the general internet. For more information about VPNs, see the Virtual Private Network article in Webopedia.

Conclusion

It comes as no surprise that proxy servers are most often used for remote resource access. Attendees at the three LITA "Proxy Web Servers and Authentication" workshops stated that learning about remote resource access is their primary reason for attending. In addition, the most common reason attendees have installed proxy servers prior to attending the workshop is to provide remote resource access.

One of the surprising outcomes from the survey is the use of interception proxies by libraries and institutions. Almost one quarter of the responses to the "Proxies for Bandwidth Conservation" question came from libraries using interception proxies. Although public libraries only made up 10% of the survey responses, two of four interception proxy installations are in public libraries. Interception proxies have caused problems for libraries in the past -- especially when installed by Internet Service Providers (ISPs). Since the interception proxy changes the IP address of the client making the request (to the IP address of the ISP), these proxies cause problems for remote databases using IP address recognition for authentication.