Classification methods

This page explains each method in detail, when to use it, and what the parameters do.

Unsupervised β€” clf.unsupervised()

Groups pixels into k spectral clusters with no training data.

gdf = clf.unsupervised(
    n_clusters=5,
    algorithm="kmeans",          # or "mini_batch_kmeans"
    dissolve=True,               # merge adjacent same-class polygons
    min_area=50.0,               # drop polygons < 50 mΒ²
)

Algorithm choice:

"kmeans"

Standard Lloyd’s algorithm. Loads all valid pixels into RAM. Best for rasters under ~10 M pixels.

"mini_batch_kmeans"

Processes pixels in batches of 10 000. Convergence is slightly noisier but uses far less memory and runs much faster on large scenes.

Tip

Call RasterClassifier.available_algorithms() to see all options with inline usage examples.

Supervised β€” clf.supervised()

Trains a Random Forest classifier from labelled training geometries.

gdf = clf.supervised(
    roi_path="training_rois.shp",
    class_col="class_id",        # column with integer labels
    n_estimators=100,             # RF trees
    dissolve=True,
    min_area=25.0,
)

ROI requirements:

  • The file may contain Points, Polygons, or both (mixed geometry).

  • For polygons, every pixel inside the geometry is used as a training sample β€” no lossy zonal-mean shortcut.

  • The label column must contain integers. If you have string labels, use encode_roi() first.

  • CRS is automatically reprojected to match the raster β€” never the reverse.

After classification the fitted model is available at clf._last_model. Use it to inspect feature importances:

importances = clf._last_model.feature_importances_

Rule-based β€” clf.from_dn_ranges()

Classify pixels by per-band value thresholds. No training data and no statistical model β€” you define the rules from domain knowledge.

rules = {
    1: [(4, 0.15, 1.0), (5, 0.0, 0.10)],   # class 1: high B4 AND low B5
    2: [(5, 0.25, 1.0)],                      # class 2: high B5
}
gdf = clf.from_dn_ranges(rules)

Rule format: {class_id: [(band, min, max), …]}

  • Band numbers are 1-based.

  • A pixel must satisfy all conditions in the list to be assigned that class.

  • If multiple rules overlap, the last matching class wins.

  • Class 0 is reserved for nodata / unclassified.

Tip

Run clf.band_stats() first to see min / max / mean / std for every band, then design rules accordingly.

Utility helpers

available_algorithms()

Print a formatted list of all supported algorithms with usage snippets. Works as a static method or on an instance:

RasterClassifier.available_algorithms()
# β€” or β€”
clf.available_algorithms()

band_stats()

Print per-band statistics directly from the raster:

clf.band_stats()
# Band  1: min=0.8706  max=10.4115  mean=2.1423  std=0.9812
# Band  2: min=0.4902  max=1.6621   mean=0.9134  std=0.1089
# ...

encode_roi()

Convert a string/categorical label column to consecutive integer IDs:

out_path, mapping = clf.encode_roi(
    "geology.shp",
    label_col="Formation",
)
# mapping = {'Alluvium': 1, 'Granite': 2, 'Schist': 3}

# Feed directly into supervised():
gdf = clf.supervised(roi_path=out_path, class_col="class_id")

Labels are sorted alphabetically and numbered from 1 (0 is reserved for nodata). The encoded file is saved next to the input by default.

Polygon options (all methods)

dissolve (default True)

Merge adjacent polygons of the same class into single features, then explode MultiPolygons to simple Polygons.

min_area (default 0.0)

Drop polygons smaller than this threshold (in map unitsΒ²). Set to e.g. 50.0 to remove 1–2 pixel speckle.

Output formats

clf.save() auto-detects format from the file extension:

clf.save(gdf, "out.shp")       # Shapefile
clf.save(gdf, "out.gpkg")      # GeoPackage (recommended)
clf.save(gdf, "out.geojson")   # GeoJSON