Classification methodsο
This page explains each method in detail, when to use it, and what the parameters do.
Unsupervised β clf.unsupervised()ο
Groups pixels into k spectral clusters with no training data.
gdf = clf.unsupervised(
n_clusters=5,
algorithm="kmeans", # or "mini_batch_kmeans"
dissolve=True, # merge adjacent same-class polygons
min_area=50.0, # drop polygons < 50 mΒ²
)
Algorithm choice:
"kmeans"Standard Lloydβs algorithm. Loads all valid pixels into RAM. Best for rasters under ~10 M pixels.
"mini_batch_kmeans"Processes pixels in batches of 10 000. Convergence is slightly noisier but uses far less memory and runs much faster on large scenes.
Tip
Call RasterClassifier.available_algorithms() to see all options
with inline usage examples.
Supervised β clf.supervised()ο
Trains a Random Forest classifier from labelled training geometries.
gdf = clf.supervised(
roi_path="training_rois.shp",
class_col="class_id", # column with integer labels
n_estimators=100, # RF trees
dissolve=True,
min_area=25.0,
)
ROI requirements:
The file may contain Points, Polygons, or both (mixed geometry).
For polygons, every pixel inside the geometry is used as a training sample β no lossy zonal-mean shortcut.
The label column must contain integers. If you have string labels, use
encode_roi()first.CRS is automatically reprojected to match the raster β never the reverse.
After classification the fitted model is available at
clf._last_model. Use it to inspect feature importances:
importances = clf._last_model.feature_importances_
Rule-based β clf.from_dn_ranges()ο
Classify pixels by per-band value thresholds. No training data and no statistical model β you define the rules from domain knowledge.
rules = {
1: [(4, 0.15, 1.0), (5, 0.0, 0.10)], # class 1: high B4 AND low B5
2: [(5, 0.25, 1.0)], # class 2: high B5
}
gdf = clf.from_dn_ranges(rules)
Rule format: {class_id: [(band, min, max), β¦]}
Band numbers are 1-based.
A pixel must satisfy all conditions in the list to be assigned that class.
If multiple rules overlap, the last matching class wins.
Class 0 is reserved for nodata / unclassified.
Tip
Run clf.band_stats() first to see min / max / mean / std for
every band, then design rules accordingly.
Utility helpersο
available_algorithms()ο
Print a formatted list of all supported algorithms with usage snippets. Works as a static method or on an instance:
RasterClassifier.available_algorithms()
# β or β
clf.available_algorithms()
band_stats()ο
Print per-band statistics directly from the raster:
clf.band_stats()
# Band 1: min=0.8706 max=10.4115 mean=2.1423 std=0.9812
# Band 2: min=0.4902 max=1.6621 mean=0.9134 std=0.1089
# ...
encode_roi()ο
Convert a string/categorical label column to consecutive integer IDs:
out_path, mapping = clf.encode_roi(
"geology.shp",
label_col="Formation",
)
# mapping = {'Alluvium': 1, 'Granite': 2, 'Schist': 3}
# Feed directly into supervised():
gdf = clf.supervised(roi_path=out_path, class_col="class_id")
Labels are sorted alphabetically and numbered from 1 (0 is reserved for nodata). The encoded file is saved next to the input by default.
Polygon options (all methods)ο
dissolve(defaultTrue)Merge adjacent polygons of the same class into single features, then explode MultiPolygons to simple Polygons.
min_area(default0.0)Drop polygons smaller than this threshold (in map unitsΒ²). Set to e.g.
50.0to remove 1β2 pixel speckle.
Output formatsο
clf.save() auto-detects format from the file extension:
clf.save(gdf, "out.shp") # Shapefile
clf.save(gdf, "out.gpkg") # GeoPackage (recommended)
clf.save(gdf, "out.geojson") # GeoJSON