Importing CDSM data into PostGIS (Take #2)
This one is for Sindile who pointed out that my last post on the topic wasnt very well explained. Here is my script again with a few improvements to deal better with the few cases where more than a single degree square is covered by a cell. I’ve also commented each line which hopefully will make what is going on a little clearer.
#!/bin/bash
# A script to import all Chief Directorate : Surveys and Mapping
# vector tiles into PostGIS.
#
# Tim Sutton 2009
#
# Lines prefix with '#' are comments.
# Drop the database if it already exists!
dropdb cdsm50k
# Now recreate the database
createdb cdsm50k
# Install the plpgsql procedural language into Postgres
# This is required before we can install the PostGIS database extensions
createlang plpgsql cdsm50k
# Install the PostGIS extention. Note you may need to adjust the path
# to lwpostgis.sql if you are using a different version of linux
psql cdsm50k < /usr/share/postgresql-8.3-postgis/lwpostgis.sql
# Install the PostGIS spatial reference system tables.
# Once again, you may need to adjust for your system
psql cdsm50k < /usr/share/postgresql-8.3-postgis/spatial_ref_sys.sql
# We assume you have a directory containing many subdirectories labeled
# after a Degree Square e.g. 2218, 2219 etc directories.
# First we will find all shapefiles within those directores and then
# loop through the resulting list of shapefile names. The list will be
# something like this:
#./3229/3229_VEGETATION_POINT_2006_04.shp
#./3229/3229_WATER_SOURCE_POINT_2006_04.shp
#./3317/3317BB_3318AA_ARTIFICIAL_SURFACE_AREA_2006_06.shp
# etc.
# As you can see there are sometimes shp files that cover more than
# one grid square, which we will account for in our script below.
MYSTART=`date`
# We will use this file (commandlog.txt) to keep a
# reproducable list of every command that was run to import data
echo "" > commandlog.txt
for MYPATH in `find . -name *.shp`
do
# Print the name of the file we are currently working on to screen
echo $MYPATH
# Get the name of the file without its directory and without its .shp
# extension. The will change e.g. ./3229/3229_VEGETATION_POINT_2006_04.shp
# to 3229_VEGETATION_POINT_2006_04 and store it in variable 'FILE'
FILE=`basename $MYPATH .shp`
# Now we want to get rid of the Degree cell number prefix from the front
# of the file name so that 3229_VEGETATION_POINT_2006_04 will be
# changed to VEGETATION_POINT_2006_04. The line below also accounts for
# when we have a 'two cell' layer like: 3317BB_3318AA_ARTIFICIAL_SURFACE_AREA_2006_06
# which afterwards will become ARTIFICIAL_SURFACE_AREA_2006_06.
# The result of this step is stored in the variable 'LAYER'
LAYER=`echo $FILE | sed 's/^[0-9]*..\_//g' | sed 's/^[0-9]*..\_//g' | sed 's/^[0-9]*\_//g'`
# Now lets get rid of that date at the end. So ARTIFICIAL_SURFACE_AREA_2006_06
# will become ARTIFICIAL_SURFACE_AREA. This again be stored as 'LAYER'
# replacing the old value.
LAYER=`echo $LAYER | sed 's/\_....\_..$//g'`
# Now lets get rid of any underscores and save the result again in 'LAYER'
# So ARTIFICIAL_SURFACE_AREA will become ARTIFICIALSURFACEAREA.
LAYER=`echo $LAYER | sed 's/\_//g'`
# If there are still any numbers left in our name (there shouldnt be)
# lets get rid of them.
LAYER=`echo $LAYER | sed 's/[0-9]//g'`
# Now lets convert the name to lower case so that ARTIFICIALSURFACEAREA
# becomes artificialsurfacearea. Once again its stored as 'LAYER'
LAYER=`echo $LAYER | tr "[:upper:]" "[:lower:]"`
# Now find out the directory name from the original file name. So
# if we had ./3229/3229_VEGETATION_POINT_2006_04.shp to start with,
# DIR will store ./3229/ afterwards.
DIR=`echo $MYPATH | sed 's/....\_.*$//g'`
# The next three lines just print the result of the above steps to
# the screen for information...
echo "File : $FILE"
echo "Layer: $LAYER"
echo "Dir : $DIR"
# Now we want to see if our database already has a table for our layer
# (e.g. artificialsurfacearea)
if [ `echo "\d" | psql cdsm50k | grep -o "${LAYER} "` ]
then
# Ok we found the table already exists so the '-a' option in
# shp2pgsql tells shp2pgsql to append the data to the existing table
# -s 4326 tells it the CRS should be lat long / wgs84
# -W UTF-8 tells it to load text data as UTF-8 rather than unicode
echo "Database table exits, appending..."
shp2pgsql -a -s 4326 -W UTF-8 $MYPATH $LAYER 2>/dev/null | psql -q cdsm50k >/dev/null 2>&1
echo "shp2pgsql -a -s 4326 -W UTF-8 $MYPATH $LAYER 2>/dev/null | psql -q cdsm50k >/dev/null 2>&1" >> commandlog.txt
else
# In the alternate case, the table doesnt already exist so we rather create
# a new table named after the layer using '-I'
echo "Database table does not exit, creating..."
shp2pgsql -I -s 4326 -W UTF-8 $MYPATH $LAYER 2>/dev/null | psql -q cdsm50k >/dev/null 2>&1
echo "shp2pgsql -I -s 4326 -W UTF-8 $MYPATH $LAYER 2>/dev/null | psql -q cdsm50k >/dev/null 2>&1" >> commandlog.txt
fi
# Ok now we loop on through to the next layer until we are all done...
done
MYEND=`date`
echo "Import started at ${MYSTART}"
echo "Import ended at ${MYEND}"
Posted on Sunday April 5th