Quality Report for
Dane County Soil Survey digital files

Report prepared 3 September 1985
by N. Chrisman Dane County Land Records Project, UW- Madison
in cooperation with US Soil Conservation Service Wisconsin Office

This report appeared in NCDCDS Report 7 pages 78-88.

This report summarizes the quality which can be expected from the digital records of the Dane County, Wisconsin soil survey in the version delivered by the Dane County Land Records Project (DCLRP) to the US Soil Conservation Service (SCS) National Cartographic Office in 1985. Dane County, Wisconsin occupies 1200 square miles in south central Wisconsin. The soil survey consists of 181 sheets reproduced at 1:15840, while the products delivered in digital form consist of parts of 34 quadrangles in the 7.5 minute series.

As suggested in the NCDCDS Interim Proposed Standard (IPS), a quality report is intended to communicate information about a digital product. Any users must evaluate the results to determine whether the data is suitable for a particular use.

This report consists of five parts: lineage, positional accuracy, attribute accuracy, logical consistency and completeness (the components required by IPS).

Lineage

This section relates the history of the Dane County soil survey from the original materials to the final digital product. This account does not cover all aspects, but tries to cover any information with potential impact on quality.

Immediately inside the cover of the printed soil survey, a box contains this information:
Major fieldwork for this soil survey was completed in the period 1966-1971. Soil names and descriptions were approved in 1972. Unless otherwise indicated, statements in the publication refer to conditions in the county in 1972.
The actual history of the work is more complex, and it becomes rather difficult to assign a single date to the product.

The compilation of soils maps proceeds in two phases, from advance field sheets to the printed report. The advance field sheets were compiled on air photographs taken between June and August 1962. The field sheets show a range of dates from 1968-1972. A second flight with dates from August to October 1974 produced the photos for the printed report. The soil maps were compiled on the 1974 base using the field sheets and corrections. The printed report contains the legend "Issued January 1978". The Dane County Land Records Project took the final maps and converted them to digital form in the period 1983-1985.

No specific information is available about the 1962 photography. Since the field sheet data was recompiled, this may not be important.

The 1974 photography was flown at elevations of about 1300-1500'. The camera (serial number UAg 477) took 9X9" negatives with a 152.38 mm lens (6"). Originals of these photos are stored in the ASCS Aerial Photography Field Office in Salt Lake City, Utah. Diapositive copies were made for the DCLRP and are available from the SCS State Office. The correspondence of negatives to soil sheets is shown on Map 2.

The photographs were rectified to remove the effects of tilt, and a set of positive prints were made at publication scale (1:15840). The process was performed at the Lincoln, Nebraska regional office (now combined with the Fort Worth, Texas national facility). The DCLRP has not been able to determine the methods used to orient and scale the photos. However, by checks performed in transfering the soils data into known coordinates, the photos appear reasonably planimetric (although relief displacement is not corrected - see below). According to current SCS National Office guidelines, the soil maps are not sufficiently accurate to merit entry into the national digital data base. (see National Cartographic Manual, draft of 9/7/82; NHQ/CRS Issue Paper "digitizing detailed soil surveys from accurate base maps versus inaccurate base maps" rev. 9/7/82) A direct test of this assumption is covered in the section on positional accuracy.

The soils boundaries were penciled onto enlargements of the 1974 photos (2.5 X to the publication scale of 1:15840). Presumably, the advance field sheets were used as a compilation source along with new field work. Some boundaries based on slope were determined with pocket stereo viewers, using adjacent photos.

A major process in soil mapping relates to the attribute system - the soil classification. In the advance field sheet stage, a three part numeric code was placed on the maps. The three parts have a correspondence to the three parts of the alphabetic code shown on the final maps: the four digit soil class became a two letter code, the numeric percent slope became the classes A .. E, and the eroded code of 2 or blank was retained. In the field process, the soil scientist could classify a particular area as a specific soil. In the office process, this soil could be reclassified into a cognate soil for a number of reasons, such as not having enough of the soil class in the county, or to enforce consistency between interpreters. There are also national directives to consolidate classification systems so that the effective date of 1972 is crucial to understand the type of soil classification used.

From the pencil product on the photobase, the published soil map was developed. A fresh mylar was pin registered, and the pencil lines were redrafted with a liquid ink drafting pen. In most cases, the pen width was about .01" (.25 mm), although there are variations in line quality. Although the map finisher primarily transfered the pencil lines, there were also cartographic rules applied to eliminate narrow areas or to simplify detail around roads and other features.

For the Dane County survey, the soil labels were applied with stickup lettering on a separate pin-registered mylar. Non-soil linear features, such as roads and drainage, were applied to the same overlay as the soils boundaries. (This separation has a bearing on the digital scanning process.)

A number of checks were built into the finishing process. Each sheet was "matched" with adjacent ones. Even though the photobases could be different due to different image centers, the soil lines were made to agree. Classifications across the sheet edges were also examined ( further information on the reliability of this process appears below under logical consistency).

Another check performed during map finishing consists of "coloring" the soil polygons to ensure that labels are consistent (no lines are missing) and that no unnecessary lines were left in. Considering the geometric complexity of some of the sheets (the driftless area leads to convoluted slope-based polygons), this process was tedious and errors did persist to be detected in later stages (see logical consistency).

The map finisher also includes PLSS section corners and state plane coordinate tick marks. The printed maps have a printed caution nearly hidden in the binding of the volume:
Coordinate grid ticks and land division corners, if shown, are approximately positioned.
This caution is well-founded, and considered below under positional accuracy.

The published maps were printed from the mylar originals, but the printed maps have no direct relation to the digital product.
The DCLRP has undertaken two major soil digitizing efforts. The first, a manual one, digitized 66 soils sheets (out of 181) between June 1983 and January 1984. The second, based on an inexpensive scanner, is still under development, but its product will complete the county during 1986. This quality report is limited to the manually digitized products.

Digitizing began with direct positive copies of the soil map originals produced by a contact process at Master Blueprint in Madison, Wisconsin. The copies were made of the line work overlay and the label overlay, so that the line digitizing and point label digitizing were performed from the same product. In a few cases, the label layer original had been lost, so the printed map had to be used in those cases. The positional accuracy of the labels is not crucial to this process. The chemical residues of the copying process (perhaps due to incomplete fixation or washing) were sufficient to affect the electrical resistence on the digitizer surface and degrade accuracy. When washed in cold water, the problem abated.

Tick marks were placed on the mylar copies to bracket the image area. The tick marks were intended to form a rectangle 15" X 9", although hand placement could create errors of a few hundredths of an inch.

Digitizing was performed at two sites: UW Land Information and Computer Graphics Facility (LICGF), and Wisconsin Dept. of Natural Resources (WDNR) Bureau of Information Management. LICGF used a TALOS 660 backlight table connected to an ORION microprocessor. The ORION had a 512 X 512 pixel plasma screen and 8" floppy disk drives. (see Chrisman and Sullivan, 1983 for procedures used). The mylar sheet was placed arbitrarily on the table (intentionally at a diagonal to avoid a known bug in the digitizer firmware). Firmware in the TALOS (SMART 3.0) was used to rotate and translate the coordinate system to agree with the tick marks. The lower left was forced to (0,0) and the lower right was used to align the X axis. The upper right point was read to confirm a reading sufficiently close to (15,9). (Note that the manual location of the tick marks did not require postional accuracy because the inch scaling of the device was unaltered). The TALOS floating point calculations seem to be accurate within the accuracy of the digitizer.

The manufacturer's specification of the device quotes a "repeatability" of .01 inch for this device. This figure could be interpreted as plus or minus .05 inch, which is the result obtained in some tests performed on this equipment by Mills (1982).

FORTRAN programs on the ORION controlled the process and wrote the results to the 8" disks. One program was used to capture the linework in unstructured form (as "spaghetti") and another for the label points. The plasma screen (8.5" X 8.5" with resolution of 512 X 512) provided almost the same line width as the original when the screen window covered one half of the soils sheet. The plasma screen also permitted selective erasure of lines if they were deleted. Graphic feedback allow some gross errors to be detected, but the screen was not registered to the map to detect the fidelity of linefollowing.

When the TALOS operated in point mode, the ORION could handle the data stream. (The problem was partially that the IO ports on the ORION could only operate at 2400 baud, and also that the FORTRAN code was not very fast on the obsolete 8 bit processor.) From tests of point mode line following, the operator was usually too stingy in recording the curvature of the soils boundaries. The SCS guidelines call for digitizing to recreate the graphic product within one linewidth, so line following mode was required. The TALOS controller was set to use distance sampling with a tolerance of .03 inch. This figure is a compromise between graphic fidelity and the communications between the TALOS and ORION. Even at this tolerance, the TALOS could get ahead of the ORION when the operator moved the cursor too fast. As there was no bell on the ORION to alert the operator, and as the operator was probably not looking at the screen, data was occassionally lost. The result was flat sections where curvature was missing. Where the flat sections detracted from the product at the checkplot stage, they were fixed in the final edit.

Once captured on the ORION, the lines were stored as binary reals (4 bytes). Each line was filtered by the Douglas-Peucker algorithm with a tolerance (half-band) of .005 inch. This reduced file size by about 50%. The data was converted to ASCII strings with coordinates sent under FORTRAN format F10.4 to send to a Digital VAX 11/780.

The second digitizing product was the set of labels. On a background of the linework (for graphic orientation), the operator digitized each soil unit label on the map. The operator entered the alphabetic identifier on the keyboard. The file was stored in ASCII and transmitted to the VAX.

The WDNR process performed essentially the same functions with somewhat different equipment. WDNR had a Bendix digitizer connected to a Data General minicomputer with a Tektronix 4014 for graphic feedback. Registration was limited to recording the coordinates of the tickmarks in the Bendix table coordinates. Software on the VAX converted this into the same system as the LICGF process.

The software on the Data General (GEdit, written by WDNR) provided a more flexible editing environment than the ORION. In addition, the operator could snap objects closed or trim off overshoots. These capabilities shortened the editing time, but did not affect the quality of the product. The Data General was able to keep up with the Bendix, so that fewer lines had to be fixed in the final inspection against the check plot.

Once transmitted to the VAX, the files were converted into ODYSSEY format with coordinates stored as 32 bit reals (this should have little impact as yet, since each sheet had its own origin). The ODYSSEY PENELOPE program (see Morehouse and Broekhuysen, 1982) was used to convert the spaghetti into a chain file. This processor detects all intersections and labels all polygons. A tolerance of .03 inch (or .02 inch for some of the DNR products) was needed to capture all of the intended intersections. This tolerance ensures that no smaller feature can occur in the file, and that no point comes within the tolerance of another. By this process, duplicate versions of a line, if within the tolerance, will be automatically removed. The numerical nature of the intersection processor has been discussed by Dougenik (1980) and by Chrisman (1983). The tolerance does not act as a traditional "filter" because it does not round off coordinate values; all coordinate positions were in the input file or come from calculated intersections. The intersection calculation is done in a local origin system with one of the points as (0,0) to ensure that precision is not lost.

The PENELOPE process produces an error report detailing the following kinds of errors: "dangling chain" caused by either undershoot, overshoot or lines missing, polygons with no labels or two conflicting labels caused by missing labels or lines or by extraneous lines. Each file was corrected using the HOMER editor until the error report had nothing further to report. Coordinates are copied through these processes without modification, in general. Missing lines were digitized on the TALOS using the process above, and shipped to the VAX. However, the correction of undershoots, for example, requires new coordinates. In some cases, a coordinate value was extracted from the feature that the undershoot should have touched, and in other cases a screen crosshair was used at large magnification. A final stage of editing for unlabelled polygons usually involved the PROTEUS processor aggregation function.

Once the file was topologically clean, a check plot was generated on mylar at the original scale. SCS examined each check plot and noted corrections required for geometric fidelity. In some cases, whole files were rejected for gross errors that can be attributed to hardware problems such as the chemical residues noted above or to personnel problems such as lack of training. After the corrections were made the file was archived as a true copy of the original survey.

 

The goal of the project was to make the soil survey compatible with local land records and other mapping bases, particularly the USGS topographic quadrangles. One part of the project examined the need for analytical removal of relief distortion using the USGS Digital Elevation Matrices (DEM) as a base. This report concerns the less complex approach using photoidentifiable points.

To control the conversion of the inch-space measurements on the soil sheet into a system of geodetically referencable coordinates, the ticks and section corner marks shown on the soil product were inadequate. The common procedure in such cases is to detect "well-defined" points, such as road intersections on both the soils map and on another planimetric base such as the USGS topographic quadrangles. The drawback of this approach is that cartographic generalization of roads and other features may degrade the accuracy of the fit. Also, the density of "well-defined" points may not be sufficient for a rigorous transformation, particularly in the rural areas where the soil map coverage is of the greatest interest.

In large portions of the United States, there is a uniformly spaced network of points used to define the Public Land Survey System. These section corners and quarter section corners formed the basis for the control of the Dane County products. Coordinates for the section corners were obtained by methods varying from direct observation with a Macrometer geopositioning receiver through traditional ground survey to manual digitizing from USGS topographic quadrangles (see quality report for USGS PLSS layer). This heterogeneous collection of coordinates is expected to improve over time, due to land surveying activities so that the quality of the control for the soil survey could also be improved.

The photobase for the soil survey is hardly detailed enough to permit the identification of survey monuments, even if they had been panelled. Instead, the position of the section corner was estimated by using the remonumentation record for each section corner and quarter section corner. This record includes a sketch showing the location of the marker with respect to street pavement, fences, etc. Control was only taken for points identified with reasonable certainty. The number of control points for each
soil sheet varied from the maximum of 32 down to 6 when lakes removed large portions of the study area. For full sheets (not involving large amounts of water), the number of control points ran between 15 and 25 in areas where coordinates existed for quarter section corners. In areas using the USGS PLSS, which was only reliable for section corners, the maximum was 12 and the typical values fell around 8. The exact numbers of control points are shown in the appended tables.

Using the control information, a transformation was calculated using a least squares fit to an affine (software written by Cliff Petersohn under the direction of Alan Vonderohe). All calculations were carried out in 64 bit double precision. The fit for each sheet was examined and often a few outliers were discarded. The resulting fits run between 20 and 40 feet of positional error (see figures appended). These values are small, considering the line width of the soil product.

Once the separate sheets were placed into a common coordinate system (either State Plane or UTM with a local offset), the adjacent sheets could be merged into a sheetless data base. At first this process was performed by the WHIRLPOOL polygon overlay processor (similar code to PENELOPE discussed above). No matter how well the sheets fit the control, this approach had problems resolving overlaps and gaps between the adjacent sheets. Much manual editing was required to clean up the slivers and overlaps. A new program (written by Kate Beard under the direction of N. Chrisman) was developed to "zip" these sheets together (see separate documentation).

The Dane County soil survey data is either delivered in state plane, UTM or geodetic coordinates (latitude, longitude). In all situations there is a local offset to preserve precision. Products in the quad sheet format were created by cutting a rectangle out of the file when stored in geodetic coordinates. This ensures that the sheet borders conform mathematically to the expectation. All conversions between state plane, UTM and geodetic coordinates are performed using software distributed by the National Geodetic Survey. This software contains disclaimer that it might not work, but these were ignored after samples proved sufficiently accurate. All calculations are carried out in 64 bit double precision, which is rather a bit of overkill for most of the coordinates processed.

Positional Accuracy

The positional accuracy of the soil survey can be estimated from two considerations: the base and the interpretations. The base accuracy was estimated by the transformation process described in the lineage report. This does not constitute a test of the digital product, in the sense that the information obtained was used to remove systematic errors. The positional error at control points for each sheet is appended.

Positional accuracy of soil interpretations cannot be determined using the existing standards for positional accuracy tests, because very few points are "well defined". An attempt to test the accuracy of the soils maps was performed as a part of the Dane County Land Records Project (described in greater detail in the DCLRP final report). First, a set of likely areas to test (about 20) were selected. Third order control was established along nearby roads using inertial autosurveyor equipment and personnel loaned by the Bureau of Land Management. These surveys were tied to second order monuments set with Macrometer surveys. Then a field crew of one SCS area supervisor (T. Hoffman) and N. Chrisman constructed the soil map in the field. The soil scientist was told of the general nature of the soil map product for the area, but he did not reconstruct that map. Auger holes were drilled, usually upslope and downslope until the location of the transition could be approximated. A wood lathe was placed in the ground and an uncertainty (ranging from 10 to 50 feet) was estimated. After three full days in the field, only four sites were staked. Surveying crews located the lathes relative to the third order control using theodolite and electronic distance meters and using stadia observations as a cross check. The positional errors of the field data fall well within the tolerances specified by the soil scientist.

The results of the study are presented on the maps attached by overplotting the field survey data and the digital soils record. Some of the errors detected are of an attribute identification nature, and reported in the next section. No standard procedure is established to report the positional accuracy of complex curves of this nature when there are uncertainties about all positions. Furthermore, some of the differences are due to cartographic limits at the scale of 1:15840.

Attribute Accuracy

The only testing performed was described above under positional accuracy. Due to the differences of soil naming procedures, the test was not carried out to the level of the specific soil series. The soil scientist would give the important distinguishing characteristic (drainage, slope, mineral/organic ...) and check back to determine if the soil map depicted the same distinction. Of the twenty soil mapping units tested, there were two problems of identification, where the unit was somewhat misclassified. In one case the underlying material (4 feet deep) was lake clay, not a beach deposit. This difference would not alter most surface interpretations of the soil, however. In the other case, the whole polygon belongs in a transition zone and it would be very hard to classify properly. Again, the classification assigned in the map would be approximately correct for many applications. In addition, in the one test of the slope classification, the determination of the higher slopes was marginal when the site was examined on the ground. There may be a bias towards land falling in the lower portions of a given slope class, not the middle. To determine this with more accuracy a more comprehensive test is required, perhaps in comparison to the USGS DEM data.

Logical Consistency


The PENELOPE process and the sheet matching process provided substantial checking of logical consistency. The result is topologically clean as established in the guidelines to the NCDCDS IPS. Some of the errors detected in the PENELOPE process were latent errors from the compilation process, in spite of substantial effort by SCS to color maps by hand. The total count of errors for the first 66 sheets is shown on the map appended. All such errors were removed in the editing process, often with recourse to the manuscript or the advance field sheets. A further, partial check of logical consistency (attribute accuracy ?) occurs along sheet borders when matched. In most cases, the classifications are identical and the sheet border can vanish. However, some classifications differ and the sheet border has to be retained. Some of these differences are simply a matter of slope category or could be a difference related to scale effects (small polygons on the sheet border are not shown whereas they might have appeared as a continuation of an adjacent polygon if the sheet boundary had been elsewhere). There is usually one problem per sheet match, on average. This could be indicative of attribute errors elsewhere on the sheet, or it could be edge specific. Without further tests, the situation cannot be clarified.

Completeness


The soil maps exhaustively partition the county, all area is assigned to one and only one soil mapping unit. This relation is ensured by the method used to check logical consistency and to match sheet boundaries.

The soil classification has limitations due to mapping rules related to the scale of 1:15840 used for compilation. The line width was approximately 26 feet on the ground, and features were not allowed to become much narrower than 50-80 feet. This rule was not fixed and was not enforced rigidly. Also, the rules tended to generalize areas smaller than an acre or so. Whatever rules were in use are specified in SCS procedures.

The soil attributes were checked against a master list of permitted codes and all unknown codes were corrected.