{ "cells": [ { "cell_type": "markdown", "id": "d65b6d0f", "metadata": {}, "source": [ "# πŸ—ΊοΈ raster2poly β€” Classification & Vectorisation Cookbook\n", "\n", "This notebook demonstrates every classification method in **`raster2poly`**\n", "using Landsat 9 geological ratio imagery.\n", "\n", "**Input raster:** `LC09_193036_20230713_geo_ratios.tif` β€” a 15-band geological\n", "ratio stack produced by the `landsat9geo` pipeline.\n", "\n", "| Band | Ratio | Geological target |\n", "|------|-------|-------------------|\n", "| 1 | Iron Oxide (Red/Blue) | Fe³⁺ gossans, laterite |\n", "| 2 | Ferrous Iron (SWIR1/Red) | Fe²⁺ mafics, chlorite |\n", "| 3 | Clay/Hydroxyl (SWIR1/SWIR2) | Al-OH kaolinite, illite |\n", "| 4 | Carbonate (SWIR2/NIR) | CO₃²⁻ calcite, dolomite |\n", "| 5 | Ferric Oxide (Red/Green) | Hematite / goethite |\n", "| 6 | NDVI | Vegetation |\n", "| … | … | … |\n" ] }, { "cell_type": "markdown", "id": "b91b7e49", "metadata": {}, "source": [ "## 0 β€” Setup" ] }, { "cell_type": "code", "execution_count": null, "id": "31e24cf6", "metadata": {}, "outputs": [], "source": [ "# !pip install raster2poly\n", "from raster2poly import RasterClassifier\n" ] }, { "cell_type": "markdown", "id": "aa225609", "metadata": {}, "source": [ "## 1 β€” Load raster & inspect\n", "\n", "`RasterClassifier` reads all bands on init and reports dimensions,\n", "band count, and the percentage of valid (non-nodata) pixels.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "6ca61577", "metadata": {}, "outputs": [], "source": [ "RASTER_PATH = \"LC09_193036_20230713_geo_ratios.tif\"\n", "clf = RasterClassifier(RASTER_PATH)\n" ] }, { "cell_type": "markdown", "id": "82022117", "metadata": {}, "source": [ "### 1a β€” List available algorithms\n", "\n", "A quick reference for all supported methods β€” callable as a\n", "static method or on an instance.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "de5b245b", "metadata": {}, "outputs": [], "source": [ "RasterClassifier.available_algorithms()\n" ] }, { "cell_type": "markdown", "id": "b96f64da", "metadata": {}, "source": [ "### 1b β€” Band statistics\n", "\n", "Before designing DN-range rules you need to know the value range\n", "of each band. `band_stats()` prints min / max / mean / std for\n", "every band in one call β€” no manual rasterio loop needed.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "d26c3460", "metadata": {}, "outputs": [], "source": [ "clf.band_stats()\n" ] }, { "cell_type": "markdown", "id": "089a0493", "metadata": {}, "source": [ "---\n", "\n", "## 2 β€” Unsupervised classification (clustering)\n", "\n", "No training data required. The algorithm groups pixels into *k*\n", "spectral clusters based purely on their multi-band signature.\n", "\n", "### 2a β€” Standard KMeans\n", "\n", "Best for small-to-medium rasters (< 10 M pixels).\n" ] }, { "cell_type": "code", "execution_count": null, "id": "391a11b8", "metadata": {}, "outputs": [], "source": [ "gdf_kmeans = clf.unsupervised(\n", " n_clusters=5,\n", " algorithm=\"kmeans\",\n", " dissolve=True, # merge adjacent same-class polygons\n", " min_area=50.0, # drop tiny speckle polygons (mΒ²)\n", ")\n", "\n", "clf.save(gdf_kmeans, \"output_kmeans.gpkg\")\n", "print(f\"\\nClasses found: {sorted(gdf_kmeans['class_id'].unique())}\")\n", "print(f\"Total polygons: {len(gdf_kmeans)}\")\n" ] }, { "cell_type": "markdown", "id": "1bff5b62", "metadata": {}, "source": [ "### 2b β€” MiniBatchKMeans (recommended for large rasters)\n", "\n", "Processes the data in small batches β†’ drastically lower RAM usage\n", "and faster convergence on rasters with tens of millions of pixels.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "6da1f960", "metadata": {}, "outputs": [], "source": [ "gdf_mini = clf.unsupervised(\n", " n_clusters=8,\n", " algorithm=\"mini_batch_kmeans\",\n", " dissolve=True,\n", " min_area=100.0,\n", ")\n", "\n", "clf.save(gdf_mini, \"output_minibatch.shp\")\n" ] }, { "cell_type": "markdown", "id": "2ee5981b", "metadata": {}, "source": [ "---\n", "\n", "## 3 β€” Supervised classification (Random Forest from ROI)\n", "\n", "Requires a shapefile / GeoPackage with labelled training geometries\n", "(Points **or** Polygons β€” or both). The column holding the labels\n", "must contain **integer** class IDs.\n", "\n", "### 3a β€” Encode string labels β†’ integer IDs\n", "\n", "If your ROI has a text column (e.g. `\"Age\"` with values like\n", "*Holocene*, *Jurassic*, …) you can convert it with `encode_roi()`\n", "β€” no external pandas step needed.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "6e6aa652", "metadata": {}, "outputs": [], "source": [ "roi_encoded_path, label_mapping = clf.encode_roi(\n", " \"ages2.shp\",\n", " label_col=\"Age\",\n", " # output_path=\"ages2_encoded.shp\", # optional, auto-generated if omitted\n", ")\n", "\n", "# label_mapping is a dict: {'Holocene': 1, 'Jurassic': 2, ...}\n", "print(\"Label mapping:\")\n", "for name, idx in label_mapping.items():\n", " print(f\" {idx}: {name}\")\n" ] }, { "cell_type": "markdown", "id": "89857ffd", "metadata": {}, "source": [ "### 3b β€” Train & classify\n", "\n", "Every pixel inside each polygon ROI is used as a training sample.\n", "This is far more robust than the old zonal-mean approach.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "edbb3b57", "metadata": {}, "outputs": [], "source": [ "gdf_rf = clf.supervised(\n", " roi_path=roi_encoded_path,\n", " class_col=\"class_id\", # the column created by encode_roi()\n", " n_estimators=100, # number of Random Forest trees\n", " dissolve=True,\n", " min_area=25.0,\n", ")\n", "\n", "clf.save(gdf_rf, \"output_random_forest.geojson\")\n", "print(f\"\\n{len(gdf_rf)} polygons across {gdf_rf['class_id'].nunique()} classes\")\n" ] }, { "cell_type": "markdown", "id": "2bf27eb7", "metadata": {}, "source": [ "### 3c β€” Inspect feature importances\n", "\n", "After `.supervised()` the fitted model is stored on\n", "`clf._last_model` β€” use it to check which bands the\n", "classifier found most discriminating.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "60bb8e25", "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "\n", "importances = clf._last_model.feature_importances_\n", "band_labels = [f\"Band {i+1}\" for i in range(len(importances))]\n", "\n", "fig, ax = plt.subplots(figsize=(10, 4))\n", "ax.barh(band_labels, importances)\n", "ax.set_xlabel(\"Feature importance\")\n", "ax.set_title(\"Random Forest β€” band importance\", fontweight=\"bold\")\n", "plt.tight_layout()\n", "plt.show()\n" ] }, { "cell_type": "markdown", "id": "80597880", "metadata": {}, "source": [ "---\n", "\n", "## 4 β€” Rule-based classification (DN ranges)\n", "\n", "Define per-band thresholds manually. Useful when you have domain\n", "knowledge about the spectral signature of each class.\n", "\n", "> **Tip:** run `clf.band_stats()` first (section 1b) to see the\n", "> actual value ranges before writing rules.\n", "\n", "Rules format: `{class_id: [(band, min, max), …]}`\n", "\n", "- Band numbers are **1-based**\n", "- A pixel must satisfy **all** conditions in the list\n", "- Unclassified pixels are dropped (class 0 = nodata)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "46f2cb8e", "metadata": {}, "outputs": [], "source": [ "dn_rules = {\n", " # Class 1: Vegetation-like β€” high clay ratio, moderate carbonate\n", " 1: [\n", " (3, 1.0, 3.5), # Band 3 (Clay/Hydroxyl) high\n", " (4, 0.9, 1.8), # Band 4 (Carbonate) moderate\n", " (2, 0.6, 1.2), # Band 2 (Ferrous) moderate\n", " (1, 1.0, 6.0), # Band 1 (Iron Oxide) not too low\n", " ],\n", "\n", " # Class 2: Bare soil / dry surfaces β€” balanced mid-range values\n", " 2: [\n", " (1, 1.0, 4.0),\n", " (2, 0.7, 1.3),\n", " (3, 0.5, 2.0),\n", " (5, 0.4, 1.5),\n", " ],\n", "\n", " # Class 3: Water / very low reflectance\n", " 3: [\n", " (2, 0.49, 0.8),\n", " (3, 0.27, 0.8),\n", " (4, 0.72, 1.0),\n", " (5, 0.18, 0.6),\n", " ],\n", "\n", " # Class 4: Bright surfaces β€” urban / light-coloured rocks\n", " 4: [\n", " (1, 4.0, 10.4), # very high iron-oxide ratio\n", " (3, 2.0, 3.5),\n", " (4, 1.2, 2.0),\n", " (5, 1.0, 2.3),\n", " ],\n", "}\n", "\n", "gdf_dn = clf.from_dn_ranges(\n", " rules=dn_rules,\n", " dissolve=True,\n", " min_area=0.0, # keep all polygons for strict extraction\n", ")\n", "\n", "clf.save(gdf_dn, \"output_dn_rules.gpkg\")\n", "print(f\"\\nClasses: {sorted(gdf_dn['class_id'].unique())}\")\n" ] }, { "cell_type": "markdown", "id": "267bdbfb", "metadata": {}, "source": [ "---\n", "\n", "## 5 β€” Quick visual comparison\n", "\n", "Plot the polygon counts per class for each method side by side.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "fed3614d", "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "\n", "fig, axes = plt.subplots(1, 3, figsize=(16, 4))\n", "\n", "for ax, (gdf, title) in zip(axes, [\n", " (gdf_kmeans, \"KMeans (5 clusters)\"),\n", " (gdf_rf, \"Random Forest\"),\n", " (gdf_dn, \"DN Rules\"),\n", "]):\n", " counts = gdf[\"class_id\"].value_counts().sort_index()\n", " counts.plot.bar(ax=ax, color=\"steelblue\")\n", " ax.set_title(title, fontweight=\"bold\")\n", " ax.set_xlabel(\"Class ID\")\n", " ax.set_ylabel(\"Polygon count\")\n", "\n", "plt.suptitle(\"Polygon counts per class β€” three classification methods\",\n", " fontsize=13, fontweight=\"bold\", y=1.03)\n", "plt.tight_layout()\n", "plt.show()\n" ] }, { "cell_type": "markdown", "id": "75362460", "metadata": {}, "source": [ "---\n", "\n", "## πŸ“‹ Output files\n", "\n", "| File | Method | Format |\n", "|------|--------|--------|\n", "| `output_kmeans.gpkg` | Unsupervised KMeans | GeoPackage |\n", "| `output_minibatch.shp` | Unsupervised MiniBatchKMeans | Shapefile |\n", "| `output_random_forest.geojson` | Supervised Random Forest | GeoJSON |\n", "| `output_dn_rules.gpkg` | Rule-based DN ranges | GeoPackage |\n", "\n", "All outputs have a `class_id` column and dissolved, cleaned polygon geometries.\n", "Load directly into QGIS / ArcGIS for further analysis.\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 5 }