Preparing GeoData

Data Conversion

Converting data types can be very handy. For example, in order to use GML data in Python with the geopandas library, we need to convert it to GeoJSON format.

Many existing file formats were invented by GIS software developers, often in a closed-source environment. This led to the large number of formats on offer today, and considerable problems transferring data between software environments. The Geospatial Data Abstraction Library (GDAL) is an open-source answer to this issue.

From Geospatial-Python Software Carpentry: “GDAL is a set of software tools that translate between almost any geospatial format in common use today (and some not so common ones). GDAL also contains tools for editing and manipulating both raster and vector files, including reprojecting data to different CRSs. GDAL can be used as a standalone command-line tool, or built in to other GIS software. Several open-source GIS programs use GDAL for all file import/export operations.”

In particular, the bash command ogr2ogr is useful for converting between file types. For example, to convert between geoJSON and gml, you could use:

ogr2ogr -f "GeoJSON" ../data/TOP10NL_37O.geojson ../data/TOP10NL_37O.gml

For more examples, see this presentation and read the ogr2ogr page in the GDAL documentation here.

Data cleaning

From datacarpentry.org: “A part of the data workflow is preparing the data for analysis. Some of this involves data cleaning, where errors in the data are identified and corrected or formatting made consistent. This step must be taken with the same care and attention to reproducibility as the analysis.

OpenRefine (formerly Google Refine) is a powerful free and open source tool for working with messy data: cleaning it and transforming it from one format into another.

This lesson will teach you to use OpenRefine to clean and format data effectively and automatically track any changes that you make. Many people comment that this tool saves them literally months of work trying to make these edits by hand.”

View the free online course material from datacarpentry.org.

Head directly to openrefine.org

Aligning projections

From Geospatial-Python Software Carpentry: “If you loaded two rasters with different projections in QGIS 3 or ArcMap/ArcPro, you’d see that they would align since these software reproject “on-the-fly”. But with R or Python, you’ll need to reproject your data yourself in order to plot or use these rasters together in calculations. We can use the CRS attribute from one of our datasets to reproject the other dataset so that they are both in the same projection.”

Georeferencing

Batch geocoding

Tools exist that let you put in the names of places (or in a .csv file) and automatically geocode using best estimate for latitude and longitude. Here is one such tool: https://www.findlatitudeandlongitude.com/batch-geocode/

Geo Annotating

Resources:

  • GeoAnnotate is a javascript application built to collect toponym and document-level geographic annotations. It is designed to work with Parse (https://www.parse.com/), a free backend service. Currently it is not hosted on any publicly accessible servers. Link to GitHub repository.

Object recognition

Resources:

Automatic Vectorization

Automatic Modeling