carto_flow.proportional_cartogram.partition¶

Batch geometry partitioning for GeoDataFrames.

This module provides the partition_geometries function for batch processing of geometries in a GeoDataFrame using either shrinking or splitting methods. Supports parallelization and progress bars.

Functions:

partition_geometries –

Process geometries in a GeoDataFrame using either shrinking or splitting methods.

partition_geometries ¶

partition_geometries(
    gdf: DataFrame | GeoDataFrame,
    columns: str | Sequence[str],
    method: Literal["shrink", "split"] = "shrink",
    normalization: Literal[
        "sum", "maximum", "row", None
    ] = None,
    simplify: float | None = None,
    mode: Literal["area", "shell"] = "area",
    tol: float = 0.05,
    direction: Literal[
        "vertical", "horizontal"
    ] = "vertical",
    alternate: bool = True,
    strategy: Literal[
        "sequential", "treemap"
    ] = "sequential",
    treemap_reference: Literal["mean", "equal"]
    | Sequence[float]
    | None = None,
    invert: bool = False,
    copy: bool = True,
    n_jobs: int = 1,
    progress: bool = False,
) -> gpd.GeoDataFrame

Process geometries in a GeoDataFrame using either shrinking or splitting methods.

This function applies geometric operations to all geometries in a GeoDataFrame based on values in one or more specified columns. Supports both single-fraction operations (one column) and multi-fraction operations (multiple columns for N-way splits/shrinks).

Parameters:

gdf (Union[DataFrame, GeoDataFrame]) –

Input DataFrame containing geometries.
columns (str or Sequence[str]) –
Column name(s) containing values to use for area scaling/splitting.
- Single string: One fraction per geometry. Output includes the processed geometry and its complement.
- List of strings: Multiple fractions per geometry (N-way split/shrink). Each column provides one fraction. Output includes N geometry columns (one per input column) plus a complement if fractions sum < 1.0.
method (('shrink', 'split'), default: 'shrink' ) –
Processing method to apply:
- 'shrink': Creates concentric shells from inside to outside. For N fractions, produces N parts (1 core + N-1 shells).
- 'split': Divides geometries into parts with specified area ratios. For N fractions, produces N parts.
normalization (('sum', 'maximum', 'row', None), default: 'sum' ) –
Normalization method for computing fractions:
- 'sum': Normalize by sum of all row sums. Each value becomes value / (sum of all values across all columns and rows). All geometries will have remainders.
- 'maximum': Normalize by maximum row sum. The geometry with the largest total gets no remainder; others are scaled proportionally.
- 'row': Normalize each row independently to sum to 1.0. No remainders. Only valid for multiple columns.
- None: Use values directly as fractions. Values should be in [0, 1] range and row sums must not exceed 1.0.
simplify (float, default: None ) –

Simplification tolerance (Douglas-Peucker). Only used with 'shrink' method.
mode (('area', 'shell'), default: 'area' ) –
Interpretation mode for fractions (only used with 'shrink' method):
- 'area': Fractions represent direct area ratios
- 'shell': Fractions represent shell thickness ratios (squared for area)
tol (float, default: 0.05 ) –

Tolerance for root finding (shrink) or area matching (split).
direction (('vertical', 'horizontal'), default: 'vertical' ) –

Initial direction for splitting (only used with 'split' method).
alternate (bool, default: True ) –

Whether to alternate direction for sequential splits (only used with 'split' method and strategy='sequential').
strategy (('sequential', 'treemap'), default: 'sequential' ) –
Splitting strategy for N-way splits (only used with 'split' method):
- 'sequential': Parts are carved off one-by-one from edges.
- 'treemap': Recursive binary partitioning for grid-like patterns.
treemap_reference (('mean', 'equal'), default: 'mean' ) –
Only used when method='split' and strategy='treemap'. Fixes the tree grouping structure so all geometries share the same spatial layout:
- 'equal': Reference fractions are [1/N, …, 1/N]. Produces a symmetric, count-balanced tree. Good when the typical distribution is unknown.
- 'mean': Reference fractions are the column-wise mean across all rows. Reflects the dataset's typical distribution.
- Sequence[float]: User-provided reference fractions (same length as the number of parts including any remainder column).
In all cases, split ratios are still computed from each row's actual fractions, so part areas exactly match the data.
invert (bool, default: False ) –

Whether to invert computed fractions (1 - fraction).
copy (bool, default: True ) –

Whether to return a copy of the input DataFrame.
n_jobs (int, default: 1 ) –
Number of parallel jobs for processing geometries.
- 1: Sequential processing (no parallelization)
- -1: Use all available CPU cores
- n > 1: Use n parallel workers
Parallelization is beneficial for large datasets (>100 geometries). Requires joblib package for n_jobs != 1.
progress (bool, default: False ) –

Whether to display a progress bar during processing. Requires tqdm package when enabled.

Returns:

GeoDataFrame –
GeoDataFrame with processed geometries. Output columns depend on input:
- Single column 'col': geometry_col, geometry_complement
- Multiple columns ['a', 'b', 'c']: geometry_a, geometry_b, geometry_c, and geometry_complement if any row has remainder.
For shrink method with multiple columns, parts are ordered from innermost core to outermost shell (first column = core). For split method, parts correspond to each fraction.

Raises:

ValueError –

If columns not found, contain invalid values, or row sums exceed 1.0 when normalization is None.
TypeError –

If input is not a pandas DataFrame or lacks geometry column.

Notes

Normalization methods for multiple columns:

'sum': value / (sum of all row sums)
Preserves relative proportions both within and across geometries
Geometries with larger totals get more filled area
All geometries have remainders
'maximum': value / (max row sum)
Row with largest total fills completely (no remainder)
Other rows scaled proportionally with remainders
'row': value / (that row's sum)
Each geometry's fractions sum to 1.0, no remainders
Only considers relative proportions within each geometry

Examples:

Single column shrinking:

>>> import geopandas as gpd
>>> from carto_flow.proportional_cartogram import partition_geometries
>>>
>>> result = partition_geometries(gdf, 'population', normalization='sum')
>>> # Output columns: geometry_population, geometry_complement

Multi-column N-way splitting:

>>> # Split each geometry into 3 parts based on sector values
>>> result = partition_geometries(
...     gdf,
...     columns=['agriculture', 'industry', 'services'],
...     method='split',
...     normalization='row'
... )
>>> # Output columns: geometry_agriculture, geometry_industry, geometry_services

Multi-column shrinking with remainder:

>>> # Shrink based on category values, keeping complement
>>> result = partition_geometries(
...     gdf,
...     columns=['cat_a', 'cat_b'],
...     method='shrink',
...     normalization='maximum'
... )
>>> # Output: geometry_cat_a (core), geometry_cat_b (outer shell),
>>> #         geometry_complement (outermost unfilled area, if any row sum < 1.0)

Parallel processing with progress bar:

>>> result = partition_geometries(
...     gdf,
...     columns='population',
...     n_jobs=-1,      # Use all CPU cores
...     progress=True   # Show progress bar
... )