Skip to content

Core Functions

src.same.run_same(ref_df, aligned_df, commonCT, outprefix=None, aligned_delaunay=None, aligned_delaunay_vertex_col=None, optim_params=None, gurobi_params=None, ignore_precomputed_triangulation=False)

Find optimal spatial matches between aligned and reference cells using MIP.

This is the core SAME optimization function. It formulates cell matching as a Mixed Integer Program (MIP) that minimizes cell type distance and coordinate distance while preserving spatial relationships through Delaunay triangle orientation constraints.

The function supports both "eager" mode (all constraints upfront) and "lazy" mode (constraints added on-demand via callbacks), with lazy mode being more memory-efficient for large problems.

Parameters:

Name Type Description Default
ref_df DataFrame

Reference dataset with required columns:

  • X, Y: Spatial coordinates
  • cell_type: Cell type annotation
  • Cell type probability columns (names in commonCT)
  • Cell ID column (name specified in optim_params['cell_id_col'])
required
aligned_df DataFrame or MetaCell

Aligned/moving dataset to match against reference. Same column requirements as ref_df. Can be a MetaCell object.

required
commonCT list of str

Names of columns containing cell type probabilities or one-hot encodings. These are used to compute cell type distance.

required
outprefix str

Output directory. If provided, saves:

  • matches_df.csv: Final matched pairs
  • aligned_df.csv, ref_df.csv: Filtered input data
  • var_out.npy: Optimization variables and diagnostics
  • matching_model.lp: Gurobi model file
None
aligned_delaunay (array - like, shape(n_triangles, 3))

Precomputed Delaunay triangulation. If None, computed automatically. Useful when using metacells with pre-filtered triangulation.

None
aligned_delaunay_vertex_col str

Column name containing vertex IDs that correspond to indices in aligned_delaunay. Required if triangulation uses non-default IDs.

None
optim_params dict

Optimization parameters from init_optim_params(). Key parameters:

  • radius: KNN search radius (default: 250)
  • knn: Number of nearest neighbors (default: 8)
  • max_matches: Max times ref can be matched (default: 1)
  • lazy_constraints: Use lazy constraint generation (default: True)
  • delaunay_penalty: Penalty for triangle flips (default: 5)
  • no_match_penalty: Penalty for unmatched cells (default: 100)
None
gurobi_params dict

Gurobi solver parameters from init_gurobi_params(). Key parameters:

  • time_limit: Max solve time in seconds (default: 7200)
  • mip_gap: Optimality gap tolerance (default: 0.05)
  • init_method: MIP start method ('greedy', 'hungarian', or None)
None
ignore_precomputed_triangulation bool

If True, compute fresh Delaunay even if precomputed one provided.

False

Returns:

Name Type Description
matches_df DataFrame

Matched pairs with columns:

  • aligned_idx, ref_idx: Row indices into filtered DataFrames
  • X, Y: Coordinates of aligned points
  • ref_X, ref_Y: Coordinates of matched reference points
  • size, ref_size: Metacell sizes (1 for individual cells)
  • Ref_{cell_id_col}, Aligned_{cell_id_col}: Original cell IDs
  • time_limit_reached: Whether solver hit time limit
  • triangle_violation: Whether point is in a flipped triangle
  • filtered_violation: Violation detected by both methods
  • run_time: Optimization time in seconds
var_out dict

Optimization diagnostics including:

  • x: Match variable values (list of 0/1)
  • no_match_vars: Unmatched penalty values
  • penalty_vars: Multi-match penalty values
  • area_penalty_vars: Triangle flip penalty values
  • violations: Spatial violation report
  • triangle_data: Triangle analysis (areas, flips)
  • lazy_cuts_added: Number of lazy constraints added
Notes

Objective Function:

The solver minimizes:

.. math::

\sum_{(i,j)} c_{ij} x_{ij} + \alpha \sum_j p_j + \beta \sum_i s_i n_i + \gamma \sum_t w_t q_t

where: - :math:c_{ij} = cell type distance + coordinate distance - :math:p_j = penalty for ref j matched more than once - :math:n_i = penalty for aligned i not matched, weighted by size :math:s_i - :math:q_t = penalty for triangle t orientation flip, weighted by :math:w_t

Lazy Constraints:

When lazy_constraints=True, triangle orientation constraints are added on-demand during optimization. This reduces memory from O(n×k³) to O(n) and is recommended for large problems (>1000 cells).

Space-Tearing:

Triangle orientation flips indicate space-tearing events where the spatial relationship between cells changes (e.g., due to tissue deformation or missing cells). The delaunay_penalty controls how strongly these are penalized.

Examples:

Basic usage:

>>> from src import run_same
>>> matches, var_out = run_same(
...     ref_df=ref_df,
...     aligned_df=aligned_df,
...     commonCT=['TypeA', 'TypeB', 'TypeC'],
...     outprefix='results/'
... )
>>> print(f"Found {len(matches)} matches")

With custom parameters:

>>> from src import run_same, init_optim_params, init_gurobi_params
>>> optim = init_optim_params(radius=500, lazy_constraints=True)
>>> gurobi = init_gurobi_params(time_limit=3600, mip_gap=0.01)
>>> matches, var_out = run_same(
...     ref_df=ref_df,
...     aligned_df=aligned_df,
...     commonCT=commonCT,
...     optim_params=optim,
...     gurobi_params=gurobi
... )
See Also

sliding_window_matching : Process large datasets in windows. init_optim_params : Create optimization parameters. init_gurobi_params : Create Gurobi parameters. greedy_triangle_collapse : Create metacells for faster optimization.


src.same.sliding_window_matching(ref, moving, commonCT=None, outprefix=None, moving_delaunay=None, moving_delaunay_vertex_col=None, optim_params=None, gurobi_params=None, ignore_precomputed_triangulation=False)

Match cells between reference and moving datasets using a sliding window approach.

This function divides the spatial domain into overlapping windows and runs the SAME optimization on each window. Results from overlapping regions are resolved using bipartite matching to ensure unique assignments.

Supports resumption from partial results when outprefix is provided.

Parameters:

Name Type Description Default
ref DataFrame or MetaCell

Reference dataset with columns ['X', 'Y', 'cell_type'] plus cell type probability columns. Can also be a MetaCell object with metacell_df attribute.

required
moving DataFrame or MetaCell

Moving/aligned dataset to be matched to reference. Same column requirements as ref. Can be a MetaCell object with metacell_df and metacell_delaunay attributes.

required
commonCT list of str

List of cell type column names (probability/one-hot columns). If None, inferred from unique values of 'cell_type' column.

None
outprefix str

Output directory path. If provided, intermediate results are saved and the function can resume from partial runs.

None
moving_delaunay array - like

Precomputed Delaunay triangulation for moving data as (n, 3) array of vertex indices. If moving is a MetaCell, this is extracted automatically.

None
moving_delaunay_vertex_col str

Column name in moving DataFrame containing vertex IDs that correspond to indices in moving_delaunay. Required if triangulation uses non-sequential IDs.

None
optim_params dict

Optimization parameters. See init_optim_params() for defaults. Key parameters include:

  • window_size: Size of each window (default: 1000)
  • overlap: Overlap between windows (default: 250)
  • radius: KNN search radius (default: 250)
  • knn: Number of nearest neighbors (default: 8)
  • lazy_constraints: Use memory-efficient constraints (default: True)
None
gurobi_params dict

Gurobi solver parameters. See init_gurobi_params() for defaults.

None
ignore_precomputed_triangulation bool

If True, compute fresh Delaunay triangulation even if precomputed one is available.

False

Returns:

Type Description
DataFrame

Matched pairs with columns:

  • aligned_idx, ref_idx: Integer indices into input DataFrames
  • X, Y: Coordinates of aligned points
  • ref_X, ref_Y: Coordinates of matched reference points
  • Cell type probability columns
  • size, ref_size: Metacell sizes (if applicable)
  • window_id: Which window the match came from
  • time_limit_reached: Whether solver hit time limit
  • triangle_violation: Whether point is in a flipped triangle
  • run_time: Optimization time for the window

Examples:

Basic usage:

>>> from src import sliding_window_matching, init_optim_params
>>> matches = sliding_window_matching(
...     ref=ref_df,
...     moving=aligned_df,
...     commonCT=['TypeA', 'TypeB', 'TypeC'],
...     outprefix='results/'
... )

With custom parameters:

>>> optim = init_optim_params(window_size=500, overlap=100, radius=300)
>>> matches = sliding_window_matching(
...     ref=ref_df,
...     moving=aligned_df,
...     optim_params=optim
... )

With metacells:

>>> from src import MetaCell
>>> mc_aligned = MetaCell(aligned_df, max_metacell_size=10)
>>> matches = sliding_window_matching(ref=ref_df, moving=mc_aligned)
See Also

run_same : Core optimization for a single region. init_optim_params : Create optimization parameter dictionary. init_gurobi_params : Create Gurobi parameter dictionary. merge_window_matches_unique_ref : Merge overlapping window results.


src.same.init_optim_params(**overrides)

Create default optimization parameters for SAME matching.

Returns a dictionary of parameters controlling the matching problem formulation, spatial constraints, and sliding window behavior. Use keyword arguments to override specific defaults.

Parameters:

Name Type Description Default
**overrides dict

Keyword arguments to override default values.

{}

Returns:

Type Description
Dict[str, Any]

Dictionary of optimization parameters with the following keys:

Sliding window parameters:

  • window_size : int, default=1000 Size of each window in coordinate units.
  • overlap : int, default=250 Overlap between adjacent windows.
  • min_cells_per_window : int, default=10 Minimum cells required to process a window.

Matching problem parameters:

  • max_matches : int, default=1 Maximum times a reference point can be matched.
  • ref_metacell_match_multiplier : int or None, default=None For metacells, multiplier for max_matches. None = use max size.
  • radius : float, default=250 Search radius for finding KNN candidates.
  • knn : int, default=8 Number of nearest neighbors to consider.

Objective function coefficients:

  • penalty_coeff : float, default=100 Penalty for reference point matched more than once.
  • no_match_penalty : float, default=100 Penalty for unmatched aligned point (per cell).
  • delaunay_penalty : float, default=5 Penalty for triangle orientation flip (space-tearing).
  • dist_ct_coeff : float, default=1 Weight for cell type distance in objective.

Output labeling:

  • cell_id_col : str, default='Cell_Num_Old' Column name for cell identifiers.

Constraint/behavior flags:

  • hard_spatial_constraints : bool, default=False If True, strictly forbid triangle flips (no soft penalty).
  • ignore_same_type_triangles : bool, default=True If True, skip constraints for triangles with homogeneous cell types.
  • ignore_knn_if_matched : bool, default=False If True, use cell-type priority in KNN search.
  • lazy_constraints : bool, default=True If True, add spatial constraints lazily (memory efficient).

Triangle quality filtering:

  • min_angle_deg : float, default=15 Filter out thin triangles with minimum angle below this threshold.

Examples:

>>> params = init_optim_params(radius=500, knn=10, lazy_constraints=True)
>>> params['radius']
500
See Also

init_gurobi_params : Create Gurobi solver parameters. run_same : Main optimization function using these parameters. sliding_window_matching : Sliding window approach for large datasets.


src.same.init_gurobi_params(**overrides)

Create default Gurobi solver parameters for SAME optimization.

Returns a dictionary of Gurobi-related parameters that control solver behavior, time limits, and lazy constraint generation. Use keyword arguments to override specific defaults.

Parameters:

Name Type Description Default
**overrides dict

Keyword arguments to override default values.

{}

Returns:

Type Description
Dict[str, Any]

Dictionary of Gurobi parameters with the following keys:

Core solve controls:

  • time_limit : int, default=7200 Maximum solve time in seconds (2 hours).
  • mip_gap : float, default=0.05 MIP optimality gap tolerance (5%).

Gurobi tuning knobs:

  • mip_focus : int, default=2 MIPFocus parameter (2 = focus on proving optimality).
  • cuts : int, default=2 Cut generation aggressiveness (0=none to 3=aggressive).
  • heuristics : float, default=0.1 Fraction of time spent on heuristics (0.0-1.0).

MIP start / initialization:

  • init_method : str or None, default=None Method for warm-starting the solver:
    • None: no initialization
    • 'greedy': fast greedy assignment
    • 'hungarian': optimal assignment (requires max_matches=1)
  • init_big_m : float, default=1e9 Large cost for forbidden pairs in Hungarian init.
  • init_hungarian_max_n : int, default=5000 Skip Hungarian init if n_aligned + n_ref exceeds this.

Lazy constraints:

  • lazy_max_cuts : int or None, default=None Global cap on lazy cuts added (None = unlimited).
  • lazy_allowed_flip_fraction : float, default=0.05 Allowed fraction of flipped triangles before adding cuts.
  • lazy_max_cuts_per_incumbent : int, default=1000 Max cuts added per incumbent solution.

Examples:

>>> params = init_gurobi_params(time_limit=3600, mip_gap=0.01)
>>> params['time_limit']
3600
See Also

init_optim_params : Create optimization parameters. run_same : Main optimization function using these parameters.