Home | Why Student Projects? | Template Guide

Title Banner: From Villages to Water Sources

GIS Concepts

Several GIS concepts were incorporated in the analysis for this project. Below is a discussion of some of these concepts.

Data Preparation Concepts

Analysis Concepts

Data Preparation Concepts

1) Projection

2) Georeferencing

3) Heads-up Digitizing, Creating New Data

4) Converting Vector to Raster Data


The term comes from the idea of passing light through a transparent globe from a single light source.  The lines on the globe (graticule plus geographical information) are then projected onto a surface on the far side.  There are many different types of projection.  Each has benefits and detractors.  For instance it is possible to have correct form or shape (conformal), equal area, equi-distant or equi-azimuthal where direction is preserved.  However, it is not possible to have a single projection that combines these attributes.  Each projection is therefore a compromise and needs to be selected based the intended use of the map.

Projection Types:

  • Geometric, where the projection surface is either a plane (intersects globe at only one point), cylinder (intersects the globe alone a great circle line), or a cone (intersects the globe along a minor circle line), see figure below.
  • Mathematical, where the projection is defined by a mathematical formula.

projection types

Figure from ESRI ArcGIS Desktop Help

Projections can be made with any of three different light source positions:

  • Gnomonic: the light source is located at the center of the globe.  This is useful for aviation because the result is that all great circles (shortest distance between two points) is a straight line.
  • Stereographic: the light source is located at the antipole.  This is used to create conformal projections
  • Orthographic: the light source is located at infinity.  True direction can be determined from this projection.

Every projection has some distortion.  For tangent projections (where the projection surface intersects the globe at one point or along one line) the distortion is least at the point/line of intersection and increases distally toward the edge of the map.  Another kind of intersection between the projection surface and the globe is a secant.  This is when the projection surface passes through the globe interior partially resulting in an intersection of a circle instead of a point or two lines instead of one line.  Again the distortion is least at the circle/lines of intersection and increases both toward the center of the circle or between the two lines and distally towards the edge of the map.

The Universal Transverse Mercator (UTM) projection was used for this project.


            This refers to the process of assigning an image, like a scanned map, accurate coordinates in the GIS data frame.  In other words it is the process of pasting the image into the correct location, scale and orientation in the GIS data frame.  In the case of a scanned map it is possible to match point locations of known coordinates on the image (determined from the graticule of the scanned map) to points of identical coordinates in the GIS data frame.  This requires use of the Georeferencing Toolbar in ArcMap.

Heads-up Digitizing, Creating New Data:

            This is the process of creating new vector data by tracing features of a base-map displayed on screen (such as a georeferenced map image) during an edit session in ArcMap.  Before the edit session can begin one must use ArcCatalog to create the feature class that will contain the digitized vector data.  During this process it is necessary to specify the type of feature (point, line, polygon) and the feature attributes.  Back in ArcMap vector data can be added to the new empty feature class.  An edit session is started by using the Editor Toolbar and vector data can be created using the Sketch Tool.  When digitizing is complete the edit session needs to ended and the edits saved.

Converting Vector to Raster Data:

            Sometimes data is not useful in the vector format (for instance when raster analysis is the goal).  Any feature layer can be converted to a raster GRID format.  Vector features are converted to raster GRID cells by assigning cells in the raster GRID the value of the vector feature that falls within that cell.  There are several methods used to assign cell values in the raster GRID in the event of more than one vector point falling within a single cell (see Theobald 2007, p. 291).  If more than one vector line feature falls within a single cell the cell will be assigned the value of the first feature encountered during processing.  Vector polygon features are converted to raster GRID cells based on one of three methods: centroid, dominant or most important (see Theobald 2007, p. 291).  It should be noted that the process of converting from vector data to raster data will introduce some error to the dataset.  It is not possible to portray the original vector data perfectly as a raster dataset.  The selection of cell size for the raster GRID is a large factor in how well the converted data will represent the original vector dataset.  The smaller the cell size the more detail will be preserved, however, the size of the file will be larger as a result.

Data Preparation Concepts

Analysis Concepts

return to top

Analysis Concepts

1) Reasoning

2) Measurement

3) Optimization

4) Uncertainty


This project demonstrates a complicated form of reasoning. As people, we are used to being able to ask an open-ended spatial question and have another person understand our question and provide an answer. However, when a vague question is asked of a computer, and more specifically a GIS, it does not know how to answer without the question being broken down into smaller and more concrete questions. The ‘reasoning’ question we are asking is “What is the best way for someone to get from a village to a place where that person can access water?” Through the analysis, this has been broken down into more concrete questions: “Based on a slope raster, what is the cost direction grid for water point 1?;” “Based on the cost direction grid for water point 1, what is the least- cost path to travel from water point 1 to  each village with less than 20 people?” After addressing these questions, and arriving at the least-cost paths from each water point to each village, we realized we had forgotten a concrete question for the GIS, which would be implicit when asking a person how to get from a village to a water point along the river. Since the cost-direction grid is only based on the slope, some of the paths required that people would have to cross the river or lake. Notice on the figure below, that the path from water point 1 to the southern villages proposes crossing the river four times as well as swimming across Dipa Lake several times!

Initial results showing path through lake and crossing river

While this makes sense for the computer, it would not make sense for villages travelling several kilometers to access water, so we had to modify the reasoning question to “What is the best way for someone to get from a village to a place where that person can access water, avoiding the lake and without crossing the river?” The only way for the least-cost path analysis to avoid the river and lake was to incorporate these features into the slope raster. We did this by reclassifying the water bodies raster (containing both the river and the lake), so that all water was equal to 100 and then added this raster to the slope raster. This was reclassified so that all values ≥100 were set to 100. This way in creating the cost-direction grid, going across a water body would be assigned a very high cost, and thus be avoided in calculating the least cost path.


These analyses we present implicitly use measurements to determine the least-cost path to travel from a village to a water access point. Because we are not only interested in the shortest path to travel, which would be a straight line distance, but a least-cost distance, we need a way of taking into account elevation changes. This is incorporated through the slope raster. We should remember a shortcoming of the slope raster is that it is based on a 90 meter DEM, which averages the elevations for a 8100 m2 area.


Optimization is when we minimize or maximize a function by systematically choosing values within an allowed set. The least-cost analysis is a type of optimization in that it minimizes energy-expenditure, using minimum slope as a proxy. The cost weight grid creates the ‘cost’ in all directions from the source, which are the water points. This form of raster analysis has been refined by Xu and Lathrop Jr. (1996) to reduce artificial 'zig-zagging' paths by increasing the number of cells from the cost raster that is used to calculate the cost from 8 to 16. The ‘cost’ can also be thought of as different levels of friction for any type of movement, even habitat connectivity(Adriaensen et al. 2003). So, the shortest path will choose the path you can travel through with greatest ease.


A certain amount of error is present in the final product of any GIS analysis. In the case of our least-cost path analysis presented here the end user of our maps should be aware that the product presented is based on the best available data that has undergone several standardized GIS processing steps. Any error present in the original data will propagate through the processing chain and become part of the final product. In addition, there is potential through the processing chain to introduce further error into the dataset through reclassification, resampling, data generalization and conversion (such as vector to raster) to name a few. This error may accumulate in an additive or multiplicative fashion (Lunetta 1991).

Some specific sources of error for our project include: potential human error associated with locating villages and water access points (these points were hand drawn from memory onto our paper map), georeferencing error, which is performed with careful mouse clicks by the analyst, digitizing error, again performed with careful mouse clicks by the analyst, error associated with the conversion of our digitized vector data to raster data and finally error associated with the least-cost distance analysis algorithms. As noted above, the DEM used in our analysis averages the elevation for a 8100 m2 (90m x 90m) area for each cell value. This is the best available DEM data available for the area, but it does not consider topography at the fine scale that one in the field would consider when choosing the actual best path to take. Therefore, the best use of our analysis results is to consider them as coarse scale corridors through which the actual least-cost path is likely to be found. In short, these identified least-cost corridors are a good place to begin a search for the least-cost path, but refinement at a finer scale will need to be done in the field.

Data Preparation Concepts

Analysis Concepts

return to top