Advancing Water Resources Research and Management |
| Symposium on Water Resources and the World Wide Web |
|---|
| Seattle, Washington, December 5-9, 1999 |
Kenneth J. Lanfear1
| Table of Contents |
|---|
| Abstract |
| Introduction |
| Log Analysis Tools | Basic Usage Statistics |
| Usage by User Class |
| Usage by Activity |
| Returning Users |
| Discussion and Conclusion |
| Definitions: |
|---|
| Hit -- One request for a file. In this analysis, only successful requests are counted. Hits is a gross measure of webserver activity and can be influenced by both website design (e.g. images per page) and the actions of users and networks in locally storing files. |
| Page -- A single view presented to a user, similar to a page in a book. A page may be comprised of several files (hits): one for the text, plus one for each image. |
| User -- A single internet address. Although commonly equated with one person, it is possible for one person to have several internet addresses (e.g. home and work) or for many people to share one address. |
| Visit -- A sequence of hits by one user, with each hit separated from the previous hit by no more than 30 minutes. |
By reading the log for a time period, we can count the hits made on the site, determine how often each page is viewed, and estimate the number of users. Domain naming conventions (government, education, commercial, foreign, etc.) inferentially identify users by type. USGS policy protects the privacy of individual users but permits statistical compilations of classes of users.
A visit is analogous to a customer coming into our store. Thus, it is a very important measure of website usage. The characteristics of each visit -- the number of hits, paths followed, etc. -- can be analyzed to learn more about user behavior. To analyze visits, however, the log must first be sorted by user and then by time. Log files for popular websites can have millions of records, so, sorting these large files can be a difficult undertaking. Because of this sorting requirement, simple log analysis programs do not record visits.
To compile the statistics presented in this paper, we developed the log_profile.pl log analysis program. log_profile.pl can handle log files from multiple webservers, select the users and pages to analyze, and even classify them into groups for analysis. Both the current version of the program and the manual page are included with this paper. Although recent commercial log analysis software now offers many of its capabilities, log_profile.pl provided this capability as early as 1995, allowing us to compile the relatively long (at least, in WWW time!) histories presented in this paper.
Typical output from log_profile.pl is shown in Appendix A. These reports began in November, 1994 and provide a nearly continuous record of monthly webserver operations. The capability to analyze visits became available in February, 1996. The discussion that follows is based on these reports.
![]() |
| Figure 1. Hits each month for the water.usgs.gov webserver, Nov 1994 - May 1999 (D) |
Part of this growth undoubtedly is due to general growth of the internet. However, by documenting usage, we were able to provide managers the incentive to develop additional content and services. A good tool for measuring success gave managers more confidence to allocate resources for WWW distribution. More content on the website, in turn, led to more usage.
Superimposed upon the large growth rate (figures 1 and 2) are an apparent annual cycle and large random month-to-month variations. Although the overall trend is strongly upward, activity actually declines in some months. Hence, month-to-month statistics must be compared with great caution.
The annual growth rate of the webserver (figure 3), exhibits a more consistent pattern.
Annual doubling in the early years probably was unsustainable, and increases have since
stabilized at more realistic levels. Growth rate
now seems to be holding steady at 50 percent per year.
![]() |
| Figure 2. Users and visits served each month by the water.usgs.gov webserver, Nov 1994 - May 1999 (D) |
![]() |
| Figure 3. Annual rate of growth of the water.usgs.gov webserver, as determined by comparing hits in the present month to the same month in the previous year. (D) |
![]() |
| Figure 4. Precentage of visits by type of domain, May 1999. (D) |
Commercial and organization users account for about two-thirds of the activity on water.usgs.gov. Users from universities account for another 12 percent. Government users account for 8 percent of activity. In-house usage is the only category where the number of users no longer is growing: nearly all USGS employees involved with water resources already access water.usgs.gov regularly.
Visits from each type of user (figure 5) show clear annual patterns for some users. Educational usage varies with the academic year, with peak usage apparently coming at the ends of semesters. Commercial and government user classes also exhibit a seasonal pattern with greater activity occurring in the early spring, perhaps coinciding with snowmelt flooding, and during the summer recreational season. Foreign usage exhibits little seasonal pattern. Usage for all classes declines in December, probably because of holidays.
The major network category is influenced by the service practices of
the network providers. These practices change over time, and may obscure
true patterns in seasonal usage.
|
![]() |
|
|
|
|
| Figure 5. Visits each month by users from government, educational, commercial, major network, and foreign domains. (D) | |
![]() |
| Figure 6. Activity of the National Water Conditions and the Water Use WWW pages, Nov 1994 - May 1999. (D) |
The National Water Conditions pages continue, in electronic format, an activity begun more than 50 years ago. Prior to its conversion to the web, the monthly National Water Conditions report had been distributed, without charge, in printed format to a circulation list of about 5,000 subscribers. Subscribers tended to be professional water managers in and outside of government. The report was converted to the web to reduce mailing costs and to respond to user requests for faster publication. The electronic version now reaches about 5 to 6 times as many readers as the previous hard-copy version. Forty percent of the identified users are from government and academia, compared with 20 percent for the water.usgs.gov website as a whole, indicating the application has retained much of its orientation towards professional users. With some exceptions, usage of the National Water Conditions has shown fairly steady month-to-month growth (figure 6, green line), indicative of users who return regularly to the site each month.
The Water Use application
has a strong orientation towards K-12 education, highlighted by a major
section on Water Science for Schools. Users of this activity are more likely to
come from commercial or major network domains. Although growing substantially
over the long term, month-to-month activity of the Water Use application
(figure 6, orange line) has
been more erratic. This is possibly a result of usage generated by
school homework assignments.
![]() |
| Figure 7. Percent of users who visit the water.usgs.gov website more than once in the same month. (D) |
In a relatively short period of 6 years, the World Wide Web has grown into the largest distribution channel for USGS water information, reaching perhaps 10 times as many people as conventional channels. Log analysis has been critical in documenting this growth to document the benefits.
1
Chief, World Wide Web Program for Water Resources,
U.S. Geological Survey, 12201 Sunrise Valley Drive, MS 438,
Reston, Virginia 20192
Email: lanfear@usgs.gov
2 The use of brand, trade, or firm names in this report is for identification purposes only and does not constitute endorsement by the U.S. Geological Survey.
![]() | |
| Symposium TOC | AWRA Home page |
Maintainer: AWRA Webserver Team
Copyright © 1999 American Water Resources Association
Content produced by U.S. Government employees as part of official duties is not subject to copyright.