Computing a shapefile for Belgian zip codes

Home

Abstract:

In most datawarehouses, the lowest detail level for geographical analysis is the zipcode*sub-city; but as in practice the sub-city is generaly encoded in a poor way; only the zipcode is usable.

The National Geography Institute provide for 100€ shapefile data at the 'statistical sector' level that can be processed to get data at zipcode level at http://www.ngi.be/FR/FR1-5-2.shtm

Checking-out the source data with QGIS

These are the most important files in the archive, each dbf,shp,prj,qpj,shx tuple makes one 'shapefile'.
The .dbf file is a plain dBase III+ file that can be easilly read with various tools like LibreOffice or even with an hexadecimal editor as this is position-based.
5,4M AD_0_StatisticSector_WSG84.dbf
4,0K AD_0_StatisticSector_WSG84.prj
4,0K AD_0_StatisticSector_WSG84.qpj
 82M AD_0_StatisticSector_WSG84.shp
160K AD_0_StatisticSector_WSG84.shx
1,1M AD_1_MunicipalSection_WSG84.dbf
4,0K AD_1_MunicipalSection_WSG84.prj
4,0K AD_1_MunicipalSection_WSG84.qpj
 43M AD_1_MunicipalSection_WSG84.shp
 24K AD_1_MunicipalSection_WSG84.shx

Caveat: The zipcode at the MunicipalSection level is the 'main' zipcode for this section. In this example I see that a '4020 Liége' combination is missing from the source data.

attributes in shape file

Checking further with QGIS, I can see the the left bank of river Meuse (4000) and the right bank (4020) have been lumped together in a single shape.

map of Liege

We have to go deeper: statistical sector level

This is the same city of Liége (62063) we now have all the statistical levels & we can visualize those with QGIS; but there are no zipcode available in the IGN data at this level.

map of Liege

By overlaying the two layers in transparent mode & combine those with external knoweledge, we can see that all 'A**' sectors map to '4000' and all 'B**' sectors map to '4020'.
More peculiar cases were manually documented in an exception file.

map of Liege

Re-Processing the data with PostGIS

All of these tools are available on Debian, as well as on RaspBian.

Note: A Raspberry PI is perfectly enough to process this data, but I would recommend to mount /var/lib/postgresql/9.5/main/ over a tmpfs to avoid SD-Card wear an increase speed.
To achieve this you'd need to modify the initscript to rsync the data from persistant storage at PostgreSQL start and to resync it back at PostgreSQL stop + every hour in a cron job too.


This is nowhere near a turn-key solution, you still need some knowledge of PostgreSQL & PostGIS administration.
zipcode.txt is simply a SELECT DISTINCT of all the valid zipcodes from the datawarehouse.


gis.sh


#!/bin/sh
unzip shape.zip AD_1_*
unzip shape.zip AD_0_*
psql -d gis -c 'DROP TABLE ad_1_municipalsection_WSG84;'
psql -d gis -c 'DROP TABLE ad_0_statisticsector_WSG84;'
export SHAPE_ENCODING="ISO-8859-1"
ogr2ogr -f "PostgreSQL" "PG:host=localhost user=tchet dbname=gis password=xxxxxx" \
        AD_1_MunicipalSection_WSG84.shp \
       -nlt PROMOTE_TO_MULTI
ogr2ogr -f "PostgreSQL" "PG:host=localhost user=tchet dbname=gis password=xxxxxx" \
        AD_0_StatisticSector_WSG84.shp \
       -nlt PROMOTE_TO_MULTI

psql -d gis -c 'CREATE TABLE zipcode_all (ZIP CHAR(4));'
cp /home/tchet/gis/zipcode.txt /tmp/zipcode.txt
psql -d gis -c "COPY zipcode_all FROM '/tmp/zipcode.txt' DELIMITER ';';"

psql -d gis -c 'CREATE TABLE overide ( CODE CHAR(5), LETTRE CHAR(1), ZIP CHAR(4), NAME CHAR(30));'
cp /home/tchet/gis/overide.csv /tmp/overide.csv
psql -d gis -c "COPY overide FROM '/tmp/overide.csv' DELIMITER ';';"
psql -d gis -c 'DROP TABLE zipcode;'
psql -d gis -f gis.sql
psql -d gis -c 'select l.zip from zipcode_all l left join zipcode r on l.zip=r.postalcode where r.postalcode is null;'
pgsql2shp gis zipcode
cp zipcode*.* /var/www/share -v

gis.sql


CREATE TABLE zipcode as
(
SELECT Z.POSTALCODE,ST_Union(wkb_geometry) as wkb_geometry
FROM
(
 SELECT POSTALCODE,ST_Union(wkb_geometry) as wkb_geometry
 FROM      ad_1_municipalsection_WSG84 L
 LEFT JOIN overide                     R
 ON        L.POSTALCODE=R.ZIP
 WHERE     ZIP IS NULL
 GROUP BY  POSTALCODE

 UNION

 SELECT R.ZIP as POSTALCODE,ST_Union(wkb_geometry) as wkb_geometry
 FROM       ad_0_statisticsector_WSG84 L
 INNER JOIN overide                    R
 ON         SUBSTR(L.NISCODE,1,6)=R.CODE || R.LETTRE
 GROUP BY   R.ZIP

 UNION

 SELECT     ZIP,ST_Union(wkb_geometry) as wkb_geometry
 FROM
 (
        SELECT CASE NISCODE
                        WHEN '11002G72-' THEN '2020'
                        WHEN '11002G73-' THEN '2020'
                        WHEN '11002G74-' THEN '2020'
                        WHEN '11002G75-' THEN '2020'
                        WHEN '11002G780' THEN '2020'
                        WHEN '11002G51-' THEN '2018'
                        WHEN '11002G522' THEN '2018'
                        WHEN '11002G53-' THEN '2018'
                        WHEN '11002G54-' THEN '2018'
                        WHEN '11002G552' THEN '2018'
                        WHEN '11002G59-' THEN '2018'
                        WHEN '54010E000' THEN '7782'
                        WHEN '54010E08-' THEN '7782'
                        WHEN '54010E099' THEN '7782'
                        WHEN '54010E10-' THEN '7783'
                        WHEN '54010E110' THEN '7782'
                        WHEN '54010E112' THEN '7783'
                        WHEN '54010E12-' THEN '7783'
                        WHEN '54010E181' THEN '7783'
                        WHEN '54010E21-' THEN '7783'
                                         ELSE '0000'     
               END AS ZIP,wkb_geometry
        FROM   ad_0_statisticsector_WSG84
        WHERE  NISCODE LIKE '11002G%'
        OR     NISCODE LIKE '54010E%'
 ) Y
 GROUP BY Y.ZIP

) Z
GROUP BY Z.POSTALCODE
)

overide.csv


11002;A;2000;ANTWERPEN
11002;B;2050;LINKEROEVER
11002;C;2060;ANTWERPEN OOST
11002;D;2018;ANTWERPEN ZUID
11002;E;2000;ANTWERPEN
11002;F;2020;ANWTERPEN WEST
11002;H;2060;ANTWERPEN OOST
11002;J;2030;ANTWERPEN NOORD
11002;K;2030;ANTWERPEN NOORD
21004;A;1000;BRUXELLES
21004;B;1000;BRUXELLES
21004;C;1000;BRUXELLES
21004;D;1000;BRUXELLES
21004;E;1020;LAEKEN
21004;F;1120;NEDER-OVERHEEMBEEK
21004;G;1130;HAREN
25121;A;1340;OTTIGNIES
25121;B;1341;CEROUX-MOUSTY
25121;C;1342;LIMELETTE
25121;D;1348;LOUVAIN-LA-NEUVE ?
24062;A;3000;LEUVEN
24062;B;3012;WILSELE
24062;C;3010;KESSEL-LO
24062;D;3001;HEVERLEE
24062;E;3000;KORBEEK-LO
24062;F;3018;WIJGMAAL
54010;e;7782;PLOEGSTEERT
54010;e;7783;BIZET
62063;A;4000;LIEGE
62063;B;4020;LIEGE
62063;C;4020;WANDRE
62063;D;4020;JUPILLE
62063;E;4020;BRESSOUX
62063;F;4030;GRIVEGNEE
62063;G;4032;CHENEE
62063;H;4031;ANGLEUR
62063;J;4000;SCLESSIN
62063;K;4000;GLAIN
62063;L;4000;ROCOURT
62063;M;4000;SART-TILMAN ?

Final result

This is exactly the data we needed for our SAS Visual Analytics server.

Expected Final Result

gis.sh, gis.sql and overide.csv are licensed under these terms:

Copyright (C) 2016 Alexandre Detiste

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is furnished to do
so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.



(c) Alexandre Detiste 2016 - alexandre (at) detiste (dot) be
Sauf spécifié autrement, les documents de ce site internet peuvent être librement copiés, a condition de ne pas être modifiés.

Valid HTML 4.0!