AWRA banner
Advancing Water Resources Research and Management

AWRA SYMPOSIUM ON GIS AND WATER RESOURCES
Sept 22-26, 1996
Ft. Lauderdale, FL

------

DEVELOPMENT OF A GIS DATABASE FOR LAKE ECOSYSTEM STUDIES

Weihe Guan and Leslie Moore

ABSTRACT

A high quality database is the starting point for any GIS application. For Lake Okeechobee, a large amount of spatial data has been accumulated by numerous ecosystem studies. This paper introduces a GIS database development in support of these studies. The project includes the following components: the establishment of a data server conveniently accessible to all users; the selection of appropriate RDBMS/GIS software for both attribute data and geographic data; the development of a general data format and database structure; the formalization of data management procedures, including input, update, conversion, QA/QC and backup; and the implementation of data query and retrieval utilities for end users to search, display, print or plot.

KEY TERMS: GIS; Database; Lake Okeechobee; Ecosystem.

INTRODUCTION

Lake Okeechobee and Associated Ecosystem Studies

Lake Okeechobee (Figure 1) is the central feature of the interconnected Kissimmee River / Lake Okeechobee / Everglades ecosystem in south Florida. It is a large (approximately seven hundred square miles), shallow (average depth about 10 feet), subtropical lake which provides water, flood protection, and recreational benefits for a population exceeding 3.5 million. The lake is also an important biological habitat for economically important fish and wildlife, including several threatened and endangered species (Aumen, 1995).

Various processes have contributed to the deterioration of the Lake Okeechobee ecosystem. Among these processes are excessive nutrient loading, which has caused increased blue-green algal blooms. These blooms, characterized by surface scums and unpleasant tastes and odors, raised concerns about declining water quality (Aumen, 1995). Several interdisciplinary, multi-year research efforts were initiated in the late 1980s in response to the algal blooms, including a lake ecosystem study. The Lake Okeechobee Ecosystem Study (LOES), conducted by the University of Florida under a contract with the South Florida Water Management District (SFWMD), was unique in that it looked beyond excessive nutrient loading to other components of the ecosystem. Research ranged from water level effects, water quality to fish and wading birds. The study's objective was to provide an ecological baseline to which future ecosystem trends can be compared, as well as an overall assessment of ecosystem health.

Data Related to Lake Okeechobee Ecosystem Study

The Lake Okeechobee Ecosystem Study addressed the following (University of Florida, 1991): (1) data synthesis, modelling, and database management; (2) water chemistry and physical parameters; (3) community and ecosystem ecology of emergent macrophytes; (4) phytoplankton, bacteria, epiphytes, submerged plants, and zooplankton; (5) patterns of distribution and abundance and the reproductivity and foraging ecology of wading birds; and (6) investigations of larval and juvenile fish.

The project involved extensive field data collection and analysis. Data related to the study were archived on floppy disks using Lotus 1-2-3 spreadsheets, WordPerfect documents, ASCII text files and ERDAS images. Data were categorized as follows: (1) plankton - bacteria, bioassays, nitrogen fixation, phytoplankton, and zooplankton; (2) plants - emergent, nutrients, seeds, soils, and submergent; (3) water quality - chlorophyll, nutrients, physical chemistry, and suspended solids; (4) wildlife - birds and fish; (5) hydrology - Lake Okeechobee hydrological data; (6) spatial data - GIS coverages, images, locational files, etc.; and (7) documentation - various text files for clarification, identification and explanation.

Why GIS

GIS was identified as an important tool for the lake ecosystem study. As stated in a LOES annual report (University of Florida, 1991), "the objective of this task is to use an ecosystems approach to develop a set of tools (models and GIS databases) that integrates the data from various tasks in this project and other projects to provide predictive capabilities with which the SFWMD can evaluate the consequences of various water management options on the marsh littoral zone of Lake Okeechobee and its interaction with the pelagic zone of the lake. Included are impacts on the fish and wildlife resources, the role of the marsh in nutrient dynamics and exchange with the lake, and estimates of the total flux of nutrients to and from the lake under different water regimes." The following areas of focus were suggested by the LOES review panel (University of Florida, 1991): (1) impact of lake stage on biotic communities; (2) impact of nutrient concentrations on biotic communities and trophic relationships; (3) direct and indirect effects of plant community structure on critical habitat and energy flow to wading birds and fishes; (4) role of littoral zone in the lake's ecology; (5) effects of water pumping from canals to the lake on lake communities and productivity; and (6) role of exotics in lake ecology.

The LOES review panel further stated (University of Florida, 1991): "A spatially based predictive model utilizing a GIS approach will be developed, capable of predicting the responses of fish and wildlife resources to management options such as lake stage manipulation, nutrient loading increases or decreases, or the long-term effects of maintaining the present regimes of stage, nutrients and flows. The model will be sufficient to provide indications of the magnitude of the changes in the system and the spatial location of such changes utilizing the GIS information layers and model parameters derived from the various tasks of this and other projects." To effectively use information collected by LOES and other studies, a functional GIS database is a high-priority need. It should include a data server, an integrated database structure, a database management procedure, and a data query and retrieval user interface.

DATA SERVER

A data server is a computing platform hosting databases. It can include computers, operating systems, networks, database management systems (DBMS), geographic information systems and other hardware/ software necessary to input, store, manage, query and output data (Cowen, et al., 1995). Ideally, a data server should be selected according to the conceptual design of the database to be developed on it. In reality, many constraints, especially financial, limit available options for data server selection.

In this study, the computing environment consists of ORACLE on a VAX minicomputer and ARC/INFO and ARCVIEW2 on SUN SPARC workstations. Some workstations use SUN OS and others use Solaris as the operating system. All computers are networked on an FDDI ring. Desk-top workstations (SPARC 2, 10 or 20) are available to end-users of the GIS database.

The GIS database addressed in this study is a special component of a relational database hosting all the information collected by LOES. The database has two components: ARC coverages for geographic features and unique feature IDs; and ORACLE tables for attribute items, with the feature ID as a unique key. The connection between the two is built on the Database Integrator in ARC/INFO and ARCVIEW2.

Due to software and storage space limitations, the database could not be hosted by a single computer. It resides on several computers across a network. Tabular data are stored in ORACLE tables on the VAX, and spatial data are stored in ARC coverage format on several workstations. One workstation, a SUN SPARC 20, is designated as the "virtual" data server. The graphic user interface for data query and retrieval is installed on this workstation, with the directory tree for all ARC coverages. Symbolic links in sub-directories point to the actual locations of the coverages on other workstations. When users query on coverage features, a workstation-VAX interface links to the ORACLE database and returns appropriate tables and records (Figure 2).

This design provides users with a seemingly holistic GIS database on the virtual data server. Users need not know the real storage locations of the database components, nor do they need to interact with any computer platform other than the virtual data server. On the other hand, the group of networked computers collectively provides the required storage and computing capacity necessary for the database, which does not exist in any single computer. When the database grows, more computers may be brought into the group without much impact on existing server components.

The disadvantage of this data server structure is its reliance on the network and each computer in the structure. If one computer goes off the network, the database will not function properly. Moreover, the database manager must have access to all database computers for maintenance purposes. Because many of the database workstations are routinely used by SFWMD staff, special effort also is needed to coordinate workstation use.

DATABASE STRUCTURE

Database structure development went through several steps: (1) evaluation of contents and formats of existing data files; (2) interviews of end users and summaries of their needs for update, query and retrieval; (3) design of the locational model as a bridge between tabular data and geographic features; and (4) design of the geographic object model to accommodate the data.

The Locational Model

Most LOES data were collected at a known location, called STATION, with x-y coordinates recorded in longitude and latitude. However, the observation made at a given STATION may represent the ecological characteristics of a point at that location, or a polygon surrounding that location. Examples are a bird nest (point) or a vegetation type (polygon). Other observations were made at a location with only a verbal description, but no x-y coordinates. Some observations were made along a transect line, and location was recorded with an origin (point) and an aspect and distance from the origin. Still other observations were made first in an area (polygon) and at several points within that area, such as a bird colony (polygon) and nests (points). The logical and efficient modeling of locational characteristics and relationships for these diverse data sets is a challenge to database development.

(Figure 3) presents the locational model, one module of the relational database object model (Lostal, 1995). STATION is classified into nine types: (1) independent point; (2) transect point; (3) independent area; (4) transect area; (5) colony; (6) nest; (7) egg; (8) nestling; and (9) bird watch point. Types five through nine are inheritable features with a many-to-one relationship (i.e., many nests are in one colony; many eggs, or nestlings are in one nest; etc.).

STATION usually has a unique ID, name, type, description, and x-y coordinate pair. STATION also can have a z (elevation) value. Soil and vegetation information can be recorded as a background environment, and two STATION locations may be linked by a transect (line) or grouped into a region with some ecological meaning (polygon).

A geographic object model was designed according to the locational model. Each STATION type and its spatial relationship with other STATION types was mapped into the geographic object model: (1) independent point-point in point coverage; (2) transect point-node in line coverage; (3) independent area-polygon in polygon coverage; (4) transect area-polygon in polygon coverage or pixel in grid; (5) colony-region in polygon coverage; (6) nest-point in point coverage; (7) egg and (8) nestling-related attributes for the nest point coverage; (9) bird watch point-point in point coverage, or node in line coverage. Elevation (z) values and time of observation are stored in the attribute tables for possible 3-dimensional manipulation through the graphic user interface.

Soft Points and Soft Polygons

In this study, "soft point" means a point without definite x-y location; and "soft polygon" means a polygon without definite boundary. In ecosystem studies, researchers often have to deal with soft points and polygons when historical and field survey data are involved. Before GIS was implemented, many field observations were made with a verbal description of location, not explicit x-y coordinates. Even with well-established GIS concepts, some ecological features are difficult to describe at a definite x-y location or within a distinctive boundary. For example, a given fish species was observed in a certain area of a water body at a certain time. That "certain area" of the water body does not have a clear boundary. Such observations may apply to animals on land, plankton in water, or a floating plant mass in a wetland.

In the soft feature model, the geographic location of a soft point is described by its probability distribution in space. A soft point may appear at any known location with a certain probability. That known location is usually contained in a polygon. When the probability equals one at a known point, the soft point becomes a hard point. Where the probability equals zero, the soft point never appears. The line dividing none-zero from zero probability areas is the boundary of the probability distribution zone. One soft point may have multiple probability distribution polygons (Figure 4).

The geographic location of a soft polygon is more difficult to describe than that of a soft point. Three parameters are required to define a polygon: size; shape; and the location of the center of gravity. When any of these parameters is uncertain, the polygon becomes a soft polygon. In theory, this leads to seven (7) types of soft polygons (Table 1). In this study, two types of soft polygons are discussed (Types 1 and 5 in Table 1), which are most common to ecological studies.

For soft polygons with uncertain size, shape and location, the probability distribution patterns are similar to those of soft points. The probability polygons are stored as a coverage, each with an attribute value indicating the probability of any point in the probability polygon belonging to the soft polygon.

For soft polygons with known size and shape, the only uncertainty is location. The polygon's gravity center may be derived from its size and shape. The probability distribution of the center determines the probability distribution of the polygon. The soft point model discussed above also applies to the center of this soft polygon type. The size and shape of the polygon can be preserved in a separate coverage or incorporated into the probability distribution polygon coverage through an appropriate algorithm.

Table 1. Types of Soft Polygons

Type
Size
Shape
Center Location
Note
1
uncertain
uncertain
uncertain
common
2
certain
uncertain
uncertain
uncommon
3
uncertain
certain
uncertain
rare
4
uncertain
uncertain
certain
uncommon
5
certain
certain
uncertain
common
6
certain
uncertain
certain
uncommon
7
uncertain
certain
certain
rare
8
certain
certain
certain
hard polygon,
a special case
of soft polygon

DATABASE IMPLEMENTATION AND MANAGEMENT

Database Implementation

The initial implementation of a database includes specifying a directory structure on the data server, determining a directory and file naming convention, specifying read/write permissions and work group arrangements, selecting precision and projection systems for coverages, setting up a standard template for attribute files, and establishing metadata standards.

Directory Structure and Naming Convention

The GIS database directory structure was set up on the virtual server. It includes sub-directories for Arc/Info coverages, ArcView project files, images, map files, programming codes, and document files. A naming convention was developed to indicate subject and format of each folder and file.

Access Permissions

Access permissions are assigned according to whether an individual is a database developer or user. Developers have read, write, and execute permissions, while users are given read and execute permissions. Thus, only developers are able to modify database files. UNIX "user groups" were used to implement access permissions.

Precision and Projection

Single precision (7 significant digits) and double precision (15 significant digits) are the only alternatives for storing coverage coordinates in Arc/Info. Double precision coverages, providing a more precise geographic location, require more storage space. Due to the large study area (hundreds of square miles) and small scale parameter variations in the ecosystem (square feet), double precision is used for most coverages.

The projection system is based on the SFWMD GIS database, and is State Plane, zone 3601 (Florida East Zone), NAD27. When the SFWMD database migrates to NAD83, all coverages and images in the Lake Okeechobee database will be transformed accordingly. The transformation will, most likely, use the Arc/Info PROJECT function.

Attribute Template

Minimal attribute data is stored with the coverages in the GIS database. Most ecological data reside in ORACLE and are linked to geographic features in the coverages by a unique ID. This ID is the only external (user specified) attribute in the coverages. Table 2 shows some typical attribute templates.
Table 2. Typical Attribute Templates for Point, Polygon, Line and Node Features
_________________________________________________________________________________ 
Point and Polygon Attribute:
COLUMN   ITEM NAME        WIDTH OUTPUT  TYPE N.DEC  ALTERNATE NAME    INDEXED? 
    1  AREA                   8    18     F      5                        ­ 
    9  PERIMETER              8    18     F      5                        ­ 
   17  COV­NAME#              4     5     B      ­                        ­ 
   21  COV­NAME­ID            4     5     B      ­                        ­ 
   25  UNIQUE-ID              4     5     B      ­                     Indexed 
_________________________________________________________________________________ 
Line Attribute:
COLUMN   ITEM NAME        WIDTH OUTPUT  TYPE N.DEC  ALTERNATE NAME     INDEXED? 
    1  FNODE#                 4     5     B      ­                        ­ 
    5  TNODE#                 4     5     B      ­                        ­ 
    9  LPOLY#                 4     5     B      ­                        ­ 
   13  RPOLY#                 4     5     B      ­                        ­ 
   17  LENGTH                 8    18     F      5                        ­ 
   25  COV­NAME#              4     5     B      ­                        ­ 
   29  COV­NAME­ID            4     5     B      ­                        ­ 
   33  UNIQUE-ID              4     5     B      ­                     Indexed 
_________________________________________________________________________________ 
Node Attribute:
COLUMN   ITEM NAME        WIDTH OUTPUT  TYPE N.DEC  ALTERNATE NAME     INDEXED? 
    1  ARC#                   4     5     B      ­                        ­ 
    5  COV­NAME#              4     5     B      ­                        ­ 
    9  COV­NAME­ID            4     5     B      ­                        ­ 
   13  UNIQUE-ID              4     5     B      ­                     Indexed 
_________________________________________________________________________________ 


"COV­NAME­ID" is an internal ID that may be automatically
assigned a new value by Arc/Info operations. Thus, it is not used as the
unique external ID.  The user-specified "UNIQUE-ID" is indexed to
decrease process times across the Arc-Oracle link. 

Metadata Standards

An on-going project at the SFWMD is standardization of metadata formats for GIS databases. One project component is the development of a graphic user interface to view and edit metadata. This component will automatically extract metadata information from GIS files, provide multiple choices whenever applicable, and prompt users to enter required information. The interface also will check for completion of user input, re-organize entries into a standard metadata format, and save input at a logical location using a standard naming convention. With this interface, users may define one metadata format for a group of similar coverages, or copy the metadata of one coverage into another, and selectively edit some items to document differences.

One major difference between the SFWMD user interface and the "DOCUMENT" function in Arc/Info ver.7 is the metadata format. "DOCUMENT" saves metadata in INFO, while the SFWMD interface saves metadata as an ASCII text file. ASCII files can be viewed without an Arc/Info license. Moreover, the SFWMD interface generates metadata following the SFWMD standard, while "DOCUMENT" is more general and was designed for a broader group of users. The Lake Okeechobee GIS database follows the SFWMD metadata standards.

Database Management

The management of a GIS database, similar to that of other databases, involves data input, update, conversion, QA/QC, and backup. The unique aspect of GIS data management is the coordination between geographic and attribute data. Geographic data requires special techniques and procedures for input, update, conversion, QA/QC, and backup, while corresponding attribute data can be managed as regular tabular data, with a unique ID linking these data to the geographic features. Once the GIS database structure is set up, data in ARC must be managed by a well-trained GIS professional, and the data in ORACLE must be managed by a DBMS professional.

DATA QUERY AND RETRIEVAL

The GIS database is accessible to all SFWMD staff for data query and retrieval. A generic user interface is being developed to support users with minimum GIS training to search, display, print or plot data from the database. Thus, the interface must be tailored to user needs.

The first step in interface development was to interview end users and identify their needs for data display, query and retrieval (Montgomery, 1993). Considering the internal structure of the database, user demand and facilities available, ArcView2 was selected as the interface platform. Avenue, the object-oriented macro language for ArcView, is used to communicate with the ORACLE database, customize the display environment, structure query statements, and standardize output formats.

A prototype of the interface was developed, presenting the look-and-feel of the interface. Development was based on user comments. The interface will be modified based on further input from users.

CONCLUSIONS AND DISCUSSION

This paper documents the major steps in developing a Lake Okeechobee GIS database. The entire process: data server set-up, data model design, database implementation, data management standardization, and user interface development, focused on serving the LOES. Database requirements and existing SFWMD hardware/software impacted significantly the overall development approach. Also introduced in the paper is the concept of modelling soft geographic features, and the approaches of establishing a virtual data server through the network.

Database development is a constant trade-off between "what it should be" and "what it could be." Instead of the "perfect database," the final product of this project is a "functional database," developed under the constraints of available hardware/software. Database optimization is a long-term task. By having a functional database first, and improving it constantly, the database will incrementally approach an ideal design that best serves its users.

REFERENCES

Aumen, N.G., 1995. "The history of human impacts, lake management, and limnological research on Lake Okeechobee, Florida (USA)", Advances in Limnology (N.G. Aumen and R.G. Wetzel eds.). Stuttgart: Schweizerbart, 45:1-16.

Cowen, David J., John R. Jenson, Patrick J. Bresnahan, Geoffrey B. Ehler, Derek Graves, Xueqiao Huang, Chris Wiesner and Halkard E. Mackey, Jr., 1995. "The design and implementation of an integrated geographic information system for environmental applications", Photogrammetric Engineering and Remote Sensing. 61(11):1393-1404.

Lostal, Sergio, 1995. Unpublished data, South Florida Water Management District, West Palm Beach, Florida.

Montgomery, Glenn E. and Harold C. Schuch, 1993. GIS Data Conversion Handbook. GIS World, Inc., Fort Collins, Colorado.

University of Florida, 1991. Ecological Studies of the Littoral and Pelagic System of Lake Okeechobee, Annual Report prepared for South Florida Water Management District, West Palm Beach, Florida.

-----

GIS Symposium | AWRA home page
Maintainer: AWRA Webserver Team
Last modified: 24Nov99 gaw
Copyright(c) 1996, American Water Resources Association