![]() |
Slides which have been added since the handout was produced |
![]() |
Slides which have changed since the handout was produced |
How many sites have deployed proxy servers now or are in the planning stages? For what reasons?
What drew you to this Regional Institute? What do you hope to get out of it at the end of the day?
A proxy is a service that sits between web servers (or, more accurately, "origin web servers") and clients. This service receives requests from clients and makes requests to servers on behalf of the clients.
A caching proxy, in addition to the above, saves a copy of the HTML pages, graphics, and other resources as they pass through.
+--------------------------------------------------------+ | You now have successfully built and installed the | | Apache 1.3 HTTP server. To verify that Apache actually | | works correctly you now should first check the | | (initially created or preserved) configuration files | | | | /usr/local/apache-proxy/conf/httpd.conf | | | and then you should be able to immediately fire up | | Apache the first time by running: | | | | /usr/local/apache-proxy/bin/apachectl start | | | Thanks for using Apache. The Apache Group | | http://www.apache.org/ | +--------------------------------------------------------+ |
#
# Proxy Server directives. Uncomment the following lines to
# enable the proxy server:
#
<IfModule mod_proxy.c>
ProxyRequests On
Listen 4545
<Directory proxy:*>
Order deny,allow
Deny from all
Allow from .your_domain.com
</Directory>
#
# Enable/disable the handling of HTTP/1.1 "Via:" headers.
# ("Full" adds the server version; "Block" removes all outgoing Via: headers)
# Set to one of: Off | On | Full | Block
#
ProxyVia On
#
# To enable the cache as well, edit and uncomment the following lines:
# (no cacheing without CacheRoot)
#
CacheRoot "/usr/local/apache-proxy/proxy"
CacheSize 5
CacheGcInterval 4
CacheMaxExpire 24
CacheLastModifiedFactor 0.1
CacheDefaultExpire 1
#NoCache a_domain.com another_domain.edu joes.garage_sale.com
</IfModule>
# End of proxy directives.
Where the bits meet the wire...
The IP (Internet Protocol) of TCP/IP
The TCP (Transmission Control Protocol) of TCP/IP
Adapted from RFC2616
Headers that can be used in either HTTP Requests or Responses.
Headers which described an resource (can be used in either HTTP Requests or Responses).
Headers used only in HTTP Requests.
Headers used only in HTTP Responses.
The "Cache-control" general header was introduced in HTTP/1.1 to consolidate and further refine the caching policies and requests in client caches and proxy caches. The "Cache-control" header defines different directives depending on whether it is used in a request or response context.
A look at the headers and cacheability parameters of three sites: www.ala.org, www.whitehouse.gov, and www.cnn.com.
View of HTTP headers using the services of http://www.web-caching.com/showheaders.html.
Overview of cacheability of HTML and referenced resources using the services of http://www.ircache.net/cgi-bin/cacheability.py.
function FindProxyForURL(url, host)
{
...
}Adapted from Navigator Proxy Auto-Configure File Format
function FindProxyForURL(url,host) {
// If the host requested on the URL line is not a FQDN
// (eg, it is 'www'), then don't proxy.
if (isPlainHostName(host)) {
return "DIRECT";
}
// Otherwise, send through proxy
return "PROXY proxy.college.edu:4545";
}
}
function FindProxyForURL(url,host) {
// If the host requested on the URL line is not a FQDN
// (eg, it is 'www'), then don't proxy.
if (isPlainHostName(host)) {
return "DIRECT";
}
// Make OPAC stations go through Proxy Server
if (myIpAddress() == "10.243.20.242" ||
myIpAddress() == "10.243.21.210" ||
myIpAddress() == "10.243.21.241" ||
myIpAddress() == "10.243.22.13" ||
myIpAddress() == "10.243.22.19" ||
myIpAddress() == "10.243.22.35" ||
myIpAddress() == "10.243.22.41" ||
myIpAddress() == "10.243.22.182") {
return "PROXY proxy.college.edu:4545";
}
// Everyone else can go directly to the origin server
return "DIRECT";
}
function FindProxyForURL(url,host) {
// If the host requested on the URL line is not a FQDN (eg, it is 'www'),
// then don't proxy.
if (isPlainHostName(host)) {
return "DIRECT";
}
// Now do the list of IP-restricted services; they go through the proxy
if (shExpMatch(host, "*eb.com") || shExpMatch(host, "*oclc.org")) {
return "PROXY proxy1.college.edu:4545; PROXY proxy2.college.edu:4545";
}
// Otherwise, go directly to the origin server
return "DIRECT";
}
4.54.39.182 - - [01/Aug/2000:00:17:10 -0500] "GET http://rave.ohiolink.edu/databases/login/abig HTTP/1.0" 302 296 4.54.39.182 - - [01/Aug/2000:00:17:11 -0500] "GET http://olc7.ohiolink.edu/cgi-bin/login/abig HTTP/1.0" 302 257 4.54.39.182 - - [01/Aug/2000:00:17:13 -0500] "GET http://olc7.ohiolink.edu/bin/gate.exe?f=login&p_lang=english&p_d=abig HTTP/1.0" 302 247 4.54.39.182 - - [01/Aug/2000:00:17:19 -0500] "GET http://olc7.ohiolink.edu/bin/gate.exe?f=search&state=3v0758.1.1 HTTP/1.0" 200 7164 4.54.39.182 - - [01/Aug/2000:00:17:20 -0500] "GET http://olc7.ohiolink.edu/style/dw.css HTTP/1.0" 304 - 4.54.39.182 - - [01/Aug/2000:00:17:40 -0500] "GET http://olc7.ohiolink.edu/bin/gate.exe?f=brwsidx&state=3v0758.1.1&p_IdxPara=JNBR&p_IdxTerm= HTTP/1.0" 200 2349 4.54.39.182 - - [01/Aug/2000:00:17:47 -0500] "GET http://olc7.ohiolink.edu/cgi-bin/submit-brws?f=brwsidx&state=3v0758.2.1&p_L=8&p_IdxTerm=The+Economist HTTP/1.0" 302 286 4.54.39.182 - - [01/Aug/2000:00:18:41 -0500] "GET http://olc7.ohiolink.edu/bin/gate.exe?f=brwsidx&state=3v0758.2.1&p_L=8&p_IdxTerm=The-Economist HTTP/1.0" 200 5323 4.54.39.182 - - [01/Aug/2000:00:19:14 -0500] "GET http://olc7.ohiolink.edu/bin/gate.exe?f=logout&state=3v0758.3.1&p_goto=cdb HTTP/1.0" 302 224
4.54.39.182 - - [01/Aug/2000:00:17:19 -0500] "GET http://olc7.ohiolink.edu/bin/gate.exe?f=search&state=3v0758.1.1 HTTP/1.0" 200 7164
Portions adapted from Apache Module mod_log_common documentation.
Log Files
We use IP addresses to analyze trends, administer the site, and gather broad demographic information for aggregate use. IP addresses are not linked to personally identifiable information. Your login ID/barcode may be linked to specific log entries, but this identification is removed before statistics are generated.
HTTP/1.0 200 Ok Content-type: text/html <HTML> <HEAD><TITLE>Can't go there</TITLE></HEAD> <BODY><P>Sorry -- you can't get there from this workstation.</P></BODY> </HTML>
if (shExpMatch(host, "*eb.com") || shExpMatch(host, "*oclc.org")) {
return "DIRECT";
}
return "PROXY www.institution.org:8080";The following sites provide services that some libraries wish to block. These are not blocked on content, but on the type of service they provide. Blocking these sites enables libraries to keep their public web stations available for research and information retrieval. The intention is to block only the parts of sites that provide the forbidden services.
The ProxyBlock directive specifies a list of words, hosts and/or domains, separated by spaces. HTTP, HTTPS, and FTP document requests to sites whose names contain matched words, hosts or domains are blocked by the proxy server. The proxy module will also attempt to determine IP addresses of list items which may be hostnames during startup, and cache them for match test as well. Example:
ProxyBlock joes-garage.com some-host.co.uk rocky.wotsamattau.edu
'rocky.wotsamattau.edu' would also be matched if referenced by IP address.
Adapted from Apache Module mod_proxy documentation.
Search the Apache Module Registry for "authentication":
http://modules.apache.org/search?search=Authentication&query=true
No. The term "Reverse Proxy" is used to describe the functions of a proxy server that sits between the Internet and the origin server.
URLs are being rewritten on the page to point back through the EZproxy server.
http://www.altavista.com/ maps to
http://concerto.law.uconn.edu:2050/
http://doc.altavista.com/help/search/search_help.shtml maps to
http://concerto.law.uconn.edu:2052/help/search/search_help.shtml
http://shopping.altavista.com/ maps to
http://concerto.law.uconn.edu:2054/
http://tools.altavista.com/ maps to
http://concerto.law.uconn.edu:2055/
Mappings are stored in the "ezproxy.hst" file.
Last year, a version of EZproxy was released with a new scheme for rewriting URLs. Using wildcard DNS entries and the "Host:" header, URLs can be rewritten as such:
http://www.altavista.com/ maps to
http://80-www.altavista.com.ezproxy.law.uconn.edu/
http://doc.altavista.com/help/search/search_help.shtml maps to
http://80-doc.altavista.com.ezproxy.law.uconn.edu/help/search/search_help.shtml
How is this better?
What do you give up?
T Database Title U http://url.to.database/search/ D domains.used.by.vendor.com D another.domain.com
T LegalTRAC from Gale U http://infotrac.galegroup.com/itweb/nellco_main D galegroup.com
Back to the discussion of network layers:
| Application Layer | |
| Transport Layer | Port |
| Internet Layer | IP address |
| Data Link (or Network Interface) Layer | Ethernet Address |
Remember where proxy servers were located? Interception Proxies operate at a different location!
For each package, we'll look at:
"This module implements a proxy/cache for Apache. It implements proxying capability for FTP, CONNECT (for SSL), HTTP/0.9, and HTTP/1.0. The module can be configured to connect to other proxy modules for these and other protocols."
http://www.apache.org/docs/mod/mod_proxy.html
"Squid is a high-performance proxy caching server for web clients, supporting FTP, gopher, and HTTP data objects."
http://www.squid-cache.org/Doc/FAQ/FAQ-1.html#ss1.1
"Libproxy is a simple rewriting pass-through proxy system designed especially for libraries."
From the README file
"DeleGate is a multi-purpose application level gateway, or a proxy server which runs on multiple platforms. DeleGate mediates communication of various protocols, applying cache and conversion for mediated data, controlling access from clients and routing toward servers. It translates protocols between clients and servers, merging several servers into a single server view with aliasing and filtering."
http://wall.etl.go.jp/delegate/
"The enterprise firewall and Web cache server."
http://www.microsoft.com/isaserver/
ISA Server includes an extensible, multilayer enterprise firewall featuring security with packet-, circuit-, and application-level traffic screening, stateful inspection, broad application support, integrated virtual private networking (VPN), system hardening, integrated intrusion detection, smart application filters, transparency for all clients, advanced authentication, secure server publishing, and more.
"The iPlanet Web Proxy Server is a powerful system for caching and filtering Web content and boosting network performance."
http://www.iplanet.com/products/iplanet_proxy/home_2_1_1ae.html
"WinProxy provides everything you need to simultaneously connect all your computers to the Internet through just one simple connection with your existing service provider."
http://www.winproxy.com/
"EZproxy provides the easiest way for libraries to extend web-based licensed databases to their remote users."
http://www.usefulutilities.com/ezproxy
"Remote Database Access (RDA) Service: The Complete Solution for all your Remote Authentication Needs"
http://www.obvia.com/
"WebManagerTM is award-winning Internet content management software. With WebManager, students will learn more and finish research faster because they'll encounter fewer distractions and delays. WebManager's caching speeds Internet access up to ten times faster - increasing the number of students you can effectively serve, without increasing the number of computers required!"
http://www.sagebrushcorp.com/tech/webmanager.cfm
"As libraries increasingly use Web-based content subscription services, they need to authenticate remote patrons who use the Web to access online resources. Remote Patron Authentication (RPA) from epixtech enables libraries to authenticate patrons outside a library facility before providing them access to restricted resources."
http://www.epixtech.com/product/rpa.htm
Shibboleth, a project of Internet2's Middleware Architecture Committee for Education, is investigating technology to support inter-institutional authentication and authorization for access to Web pages. The intent is to support, as much as possible, the heterogeneous security systems in use on campuses today, rather than mandating use of particular schemes like Kerberos or X.509-based PKI.
Adapted from http://middleware.internet2.edu/shibboleth/
Shibboleth \Shib"bo*leth\, n. [Heb. shibb[=o]leth an ear of corn, or a stream, a flood.]Adapted from http://middleware.internet2.edu/shibboleth/why-shibboleth.html and Webster's Revised Unabridged Dictionary (1913).
- A word which was made the criterion by which to distinguish the Ephraimites from the Gileadites. The Ephraimites, not being able to pronounce sh, called the word sibboleth. See --Judges xii.
- Hence, the criterion, test, or watchword of a party; a party cry or pet phrase.
| E-mail: | proxy@PandC.org |