The capture of massive quantities of spatial data, able to be distributed and shared in real time, provides for an ever-increasing range of environmental and societal applications. Data capture includes the principles, methods, technologies, applications, and institutional/programmatic aspects of spatial data acquisition. Sources of data include hardcopy maps, global navigation satellite systems, satellite and aerial sensing, field surveys, land records, socioeconomic data (e.g., census), volunteered geographic information, wireless sensor networks, and unmanned aerial systems.

Data Capture | GIS&T Body of Knowledge (ucgis.org)

UNDERSTANDING OF DIGITIZATION AND OTHER MANUAL DATA COLLECTION AND CONVERSION METHODS

Digitization in Geographic Information Systems (GIS) is the process of converting geographic data from hardcopy or printed material into digital form.

KEY CONCEPTS AND TERMINOLOGY

  • Manual Digitizing:
    • Description: Manual digitizing involves copying features from a physical map or image by hand to create a digital file.
    • Method: It is done using digitizing tablets or pucks, which are similar to computer mice.
    • Accuracy: Manual digitizing can achieve high accuracy.
    • Use Case: Useful when converting paper maps or drawings into digital format.
  • Heads-up Digitizing:
    • Description: Heads-up digitizing involves scanning paper documents (maps, drawings) into digital files.
    • Advantages: It avoids damage or loss of the original document.
    • Limitations: Cannot scan color or larger files.
    • Organizational Benefit: Makes paperwork more organized and reduces troubleshooting time.
  • Automatic Digitizing:
    • Description: Automatic digitizing converts raster data (images) to vector data (points, lines, polygons).
    • Purpose: Increases speed and efficiency of GIS data collection.
    • Goal: Provides up-to-date spatial data in real-time.
  • Types of Digitizing Errors in GIS:
    • Geodetic Errors: Inaccuracies due to coordinate system transformations.
    • Dangling Nodes: Unconnected endpoints in line features.
    • Switchbacks, Knots & Loops: Overlapping or tangled lines.
    • Overshoots and Undershoots: Features extending beyond or falling short of their intended boundaries.
    • Silver Polygon: A polygon with self-intersecting boundaries.
  • Primary data - collected specifically for the purpose of a researcher’s particular study
    • Physical Measurement - recording physical properties of the earth or its inhabitants - size, number, temperature, chemical makeup, moisture, etc.
    • Observation of behavior - observable actions or activities of individuals or groups - not thoughts, feelings, or motivations
    • Archives - records that have been collected primarily for non-research purposes (secondary)
    • Explicit reports - beliefs people express about things – survey.
    • Computational Modeling - models as simplified representations of portions of reality
  • Secondary data - collected for another purpose by someone other than the researcher.
  • 5 types of measurement - physical measurement, observation of behavior, archives, explicit reports, computational modeling
    • Physical Measurement - recording physical properties of the earth or its inhabitants - size, number, temperature, chemical makeup, moisture, etc.
    • Observation of behavior - observable actions or activities of individuals or groups - not thoughts, feelings, or motivations
    • Archives - records that have been collected primarily for non-research purposes (secondary)
    • Explicit reports - beliefs people express about things – survey.
    • Computational Modeling - models as simplified representations of portions of reality
  • Quantitative data - numerical values, measured on at least an ordinal level but could be on a metric level.
  • Qualitative data - nonnumerical or numerical (nominal) values that have no quantitative meaning.
  • Deceptive mapping - maps can be distorted for propaganda, military protection, or ignorance.
  • Layer – mechanism to display geographic datasets.
  • Data Transfer Standards
    • Transfer - follow Spatial Data Transfer Standard (SDTS) - Federal Information Processing Standard (173)- robust way of transferring GIS data between computers with no information loss, including metadata.
    • Industry Standards - typically do not exchange topology, only graphic info; large number of format translators.
    • Open GIS Consortium (OGC) – non-profit, international, voluntary consensus standards organization - created GML or Geography Markup Language - XML based encoding standard.

SAMPLE QUESTION

What is the process of digitizing in GIS?

A) Converting geographic data from vector to raster format.

B) Creating topographic maps from satellite imagery.

C) Converting features from a hardcopy or scanned image into vector data by tracing.

D) Generating 3D models from elevation data.

Answer: C) Converting features from a hardcopy or scanned image into vector data by tracing.

Explanation: Digitizing involves capturing geographic features by tracing them from maps or images, resulting in point, line, or polygon data in vector format. It’s a fundamental step in creating accurate GIS datasets1234.

KNOWLEDGE OF FIELD DATA COLLECTION

SAMPLE QUESTION

Which of the following methods is commonly used for field data collection in GIS?

A) Adding geotagged photos as “photos with locations” to an online web map.

B) Collecting a GPX file from GPS receivers and smartphone fitness apps.

C) Generating a table in CSV or TXT format and adding it to an online web map.

D) Using a mobile device such as a smartphone or tablet paired with a Bluetooth GNSS GPS.

Answer: D) Using a mobile device such as a smartphone or tablet paired with a Bluetooth GNSS GPS.

KNOWLEDGE OF AUTOMATED DATA COLLECTION AND CONVERSION METHODS

Automated data collection and conversion takes many forms and includes the use of various methodologies, instruments or sensors, and software tools for capturing and converting data for use in GIS. Often data starts out as “non-spatial” but is “spatially enabled” during the conversion process, sometimes referred to as ETL (Extract, Translate and Load).

KEY CONCEPTS AND TERMINOLOGY

  • Feature Extraction: Feature extraction refers to the process of transforming raw data (such as satellite imagery, LiDAR point clouds, or other geospatial data) into meaningful features that can be used for analysis, visualization, or modeling. Often, these features represent specific objects, patterns, or characteristics within the data.
    • Methods of Feature Extraction in GIS:
      • Manual Feature Extraction:
        • Description: Human analysts manually identify and delineate features of interest.
        • Use Cases: Identifying building footprints, roads, rivers, or land cover types.
        • Advantages: High accuracy but time-consuming.
      • Automated Feature Extraction:
        • Description: Algorithms and computational techniques automatically detect and extract features.
        • Examples:
          Deep Learning: Neural networks analyze imagery to identify objects, classify pixels, or detect changes.
          Pattern Recognition: Algorithms recognize specific shapes or textures.
          Segmentation: Dividing an image into meaningful regions.
        • Advantages: Faster processing, especially for large datasets.
    • Applications of Feature Extraction:
      • Land Cover Classification: Extracting land cover types (forests, urban areas, water bodies) from satellite imagery.
      • Object Detection: Identifying specific objects (cars, buildings, trees) in aerial photos.
      • Change Detection: Comparing features over time to detect alterations (urban expansion, deforestation).
      • Terrain Modeling: Extracting elevation contours, slope, or aspect from LiDAR data.
  • Data or Web Scraping:
    • Description: This method involves extracting data from sources that are not intended to be accessed or read by machines.
    • Process: Automated tools visit websites, analyze their content, and extract relevant data. It’s like a digital “scraping” of information.
    • Use Cases:
      • Collecting product prices from e-commerce websites.
      • Extracting news headlines from various news portals.
    • Advantages:
      • Efficient for large-scale data extraction.
      • Useful for monitoring changes on websites.
  • Using APIs (Application Programming Interfaces):
    • Description: APIs allow software applications to communicate with each other.
    • Process: Developers use APIs to retrieve specific data from online services or databases.
    • Use Cases:
      • Fetching weather data from a weather service API.
      • Accessing social media data (e.g., Twitter API).
    • Advantages:
      • Structured and reliable data.
      • Direct access to specific information.
  • Remote Sensing: process of employing and instrument to acquire information about an object or phenomenon without making physical contact with it. Unlike in situ or on-site observation, remote sensing allows us to gather data from a distance.
  • ETL (Extract, Transform, Load): is a fundamental data integration process used to combine data from multiple sources into a consistent format for loading into a data warehouse, data lake, or other target system.
    • Extract
      • During the extraction phase, raw data is copied or exported from various source locations (such as databases, CRM systems, flat files, web pages, etc.) to a staging area.
      • Data can be both structured (e.g., SQL databases) and unstructured (e.g., web pages).
      • The goal is to gather relevant data for further processing.
    • Transform
      • In the staging area, the raw data undergoes data processing.
      • Transformation involves:
        • Cleaning: Removing inconsistencies, errors, and duplicates.
        • Sanitizing: Ensuring data quality by standardizing formats.
        • Aggregating: Combining data from different sources.
        • Enriching: Adding additional information (e.g., calculating derived metrics).
        • The transformed data is now ready for its intended analytical use case.
    • Load
      • In this final step, the cleaned and transformed data is loaded into a target database (such as a data warehouse).
      • The data is organized and structured for efficient querying and reporting.
      • Loading can happen incrementally or in batch mode.

SAMPLE QUESTION

Which of the following statements accurately describes automated data collection in GIS?

A) Automated data collection involves manually recording empirical observations in the field.

B) Automated collection refers to converting legacy data into digital format.

C) Automated data collection includes sensor-derived data and obtaining existing data from other sources.

D) Automated collection primarily relies on geotagged photos.

Answer: C) Automated data collection includes sensor-derived data and obtaining existing data from other sources employing hardware and software without human intervention or manual processes. This method leverages technology and tools to efficiently collect and integrate geographic information12.

Explanation: Automated data collection plays a crucial role in modern GIS workflows, allowing for efficient and accurate acquisition of spatial data from various sources.

KNOWLEDGE OF REMOTELY SENSED DATA SOURCES AND COLLECTION METHODS

Remotely sensed data refers to information acquired from a distance using sensors on satellites and aircraft. It involves detecting and monitoring physical characteristics of an area by measuring reflected or emitted radiation without direct physical contact with the object.

Characteristics of remotely sensed data include:

Sensors: Special cameras collect remotely sensed images.

Distance: Data is acquired from a distance, typically from satellites or aircraft.

Radiation: Reflected or emitted energy (such as visible light, infrared, or microwave) is detected and recorded.

KEY CONCEPTS AND TERMINOLOGY

  • Remote Sensing: 3 resolutions; spatial, spectral (electromagnetic spectrum measured), temporal (repeat cycle)
  • Aerial photography and satellite imagery
  • Passive sensors: gather radiation that is emitted from objects. Photography, infrared, radiometers
  • Active sensors: emit energy and measure the amount of energy bounced back from objects.
  • RADAR: acronym for radio detection and ranging; is an electromagnetic sensor system used for detecting, locating, tracking, and recognizing objects at considerable distances.
  • LiDAR: acronym for Light detection and ranging; is a remote-sensing technology that uses laser beams to measure precise distances and movement in an environment, in real time. It operates by targeting an object or surface with a laser and measuring the time it takes for the reflected light to return to the receiver.
  • Multispectral scanning: remote-sensing instrument used for Earth observation capturing data across multiple spectral bands simultaneously. The Landsat program employs these types of scanners.
  • Infrared Imaging: also known as thermal imaging; is a sophisticated and non-invasive technique that utilizes infrared technology to detect heat emissions from various objects.

SAMPLE QUESTION

Which of the following statements accurately describes the sources of remotely sensed data in GIS?

A) NASA Earth Observation (NEO) provides free satellite imagery.

B) USGS Earth Explorer offers access to historical aerial photographs.

C) ESA’s Sentinel data includes radar and optical imagery.

D) All of the above.

Answer: D) All of the above. Each of these sources provide data captured by remote sensing devices such as satellites or aircraft based imagery platforms. NEO offers a wealth of Earth observation data, including multispectral and hyperspectral imagery, which is valuable for various GIS applications12.

Remember, these data sources play a crucial role in understanding our planet’s dynamics and supporting informed decision-making!

KNOWLEDGE OF ACQUISITION, USE, AND LIMITATIONS OF CROWDSOURCED AND OPEN-SOURCE DATA AND SERVICES

Crowdsourced data refers to information, opinions, or work that is collected from a large group of people. This data is typically sourced via the Internet, social media platforms, and smartphone apps.

Open-sourced data refers to information that can be freely used, re-used, and redistributed by anyone, subject only to the requirement for attribution and sharing alike. There are many sources of open data from public and private providers which can either be downloaded or directly accessed via Web Mapping Services (WMS). It is important to carefully review any open-source data to ensure its accuracy and usability. Examples of common web services open to the public are Microsoft’s’ Bing Maps Services and the USGS’ National Map Services.

KEY CONCEPTS AND TERMINOLOGY

  • Web Mapping Service (WMS): A WMS is a standard protocol developed by the Open Geospatial Consortium (OGC) in 1999.
  • Web Feature Service (WFS): A WFS provides essential tools for creating interactive maps with features like search capabilities, filtering, and sorting. Unlike WMS, a WFS gives access to vector data (not raster).
  • Web Coverage Service: Like a WFS, a WCS allows you to request multidimensional raster data.
  • GeoServices REST Specification: The GeoServices REST Specification provides an open way for web clients to communicate with GIS servers by issuing requests to the server through structured URLs. The server responds with map images, text based geographic information, or other resources that satisfy the request.
  • Collection Methods:
    • Crowdsourcing involves obtaining data from a diverse group of individuals who voluntarily contribute their insights or perform specific tasks.
    • Examples include self-reported accident updates on traffic apps like Waze, where drivers share real-time information with other users.
  • Variety of Contributors
    • People involved in crowdsourcing may work as paid freelancers or contribute voluntarily.
    • The crowd can consist of individuals with different skills, backgrounds, and perspectives from all over the world.
  • Advatanges
    • Cost Savings: Companies can save time and money by outsourcing work to a distributed crowd rather than maintaining in-house employees.
    • Skill Diversity: Crowdsourcing allows tapping into a vast array of skills and expertise.
    • Real-Time Data: Crowdsourced data can provide up-to-date information due to its dynamic nature.
  • Limitations and drawbacks
    • Quality and Accuracy:
      • Variability: Crowdsourced data can be inconsistent in quality due to the diverse backgrounds and expertise of contributors.
      • Misinformation: Incorrect or biased information may spread through crowdsourcing platforms, affecting the overall accuracy of the data.
    • Bias and Representativeness:
      • Selection Bias: The crowd may not represent the entire population, leading to skewed results.
      • Demographic Bias: Certain demographics (e.g., tech-savvy individuals) are overrepresented, while others are underrepresented.
      • Cultural Bias: Cultural differences can impact the interpretation of tasks or questions.
    • Privacy and Security:
      • Data Privacy: Crowdsourced data often involves personal information. Ensuring privacy and protecting sensitive data can be challenging.
      • Security Risks: Data breaches or misuse can occur if security measures are inadequate.
    • Motivation and Incentives:
      • Intrinsic vs. Extrinsic Motivation: Contributors may participate for different reasons (e.g., altruism, financial gain). Incentives can affect data quality.
      • Free-Riding: Some contributors may benefit without actively contributing, relying on others’ efforts.
    • Task Complexity:
      • Complex Tasks: Crowdsourcing is better suited for simple, well-defined tasks. Complex tasks may require specialized expertise that the crowd lacks.
    • Lack of Context:
      • Contextual Understanding: Contributors may lack context, leading to incomplete or inaccurate responses.
      • Ambiguity: Ambiguous tasks can result in varied interpretations.
    • Cost and Time:
      • Aggregation Effort: Curating and validating crowdsourced data can be time-consuming and costly.
      • Revisions: Iterative revisions may be necessary to improve data quality.

SAMPLE QUESTION

Which of the following statements accurately describes limitations of crowdsourced data?

A) Results can be easily skewed based on the crowd being sourced.

B) Lack of confidentiality or ownership of an idea.

C) Potential to miss the best ideas, talent, or direction and fall short of the goal or purpose.

D) All of the above. 

Answer: D) All of the above. Crowdsourcing, while valuable, has its limitations, including potential biases, lack of confidentiality, and  the risk of missing critical insights1234.