Core Functions

`src.same.run_same(ref_df, aligned_df, commonCT, outprefix=None, aligned_delaunay=None, aligned_delaunay_vertex_col=None, optim_params=None, gurobi_params=None, ignore_precomputed_triangulation=False)`

Find optimal spatial matches between aligned and reference cells using MIP.

This is the core SAME optimization function. It formulates cell matching as a Mixed Integer Program (MIP) that minimizes cell type distance and coordinate distance while preserving spatial relationships through Delaunay triangle orientation constraints.

The function supports both "eager" mode (all constraints upfront) and "lazy" mode (constraints added on-demand via callbacks), with lazy mode being more memory-efficient for large problems.

Parameters:

Name	Type	Description	Default
`ref_df`	`DataFrame`	Reference dataset with required columns: `X`, `Y`: Spatial coordinates `cell_type`: Cell type annotation Cell type probability columns (names in `commonCT`) Cell ID column (name specified in optim_params['cell_id_col'])	required
`aligned_df`	`DataFrame or MetaCell`	Aligned/moving dataset to match against reference. Same column requirements as `ref_df`. Can be a MetaCell object.	required
`commonCT`	`list of str`	Names of columns containing cell type probabilities or one-hot encodings. These are used to compute cell type distance.	required
`outprefix`	`str`	Output directory. If provided, saves: `matches_df.csv`: Final matched pairs `aligned_df.csv`, `ref_df.csv`: Filtered input data `var_out.npy`: Optimization variables and diagnostics `matching_model.lp`: Gurobi model file	`None`
`aligned_delaunay`	`(array - like, shape(n_triangles, 3))`	Precomputed Delaunay triangulation. If None, computed automatically. Useful when using metacells with pre-filtered triangulation.	`None`
`aligned_delaunay_vertex_col`	`str`	Column name containing vertex IDs that correspond to indices in `aligned_delaunay`. Required if triangulation uses non-default IDs.	`None`
`optim_params`	`dict`	Optimization parameters from `init_optim_params()`. Key parameters: `radius`: KNN search radius (default: 250) `knn`: Number of nearest neighbors (default: 8) `max_matches`: Max times ref can be matched (default: 1) `lazy_constraints`: Use lazy constraint generation (default: True) `delaunay_penalty`: Penalty for triangle flips (default: 5) `no_match_penalty`: Penalty for unmatched cells (default: 100)	`None`
`gurobi_params`	`dict`	Gurobi solver parameters from `init_gurobi_params()`. Key parameters: `time_limit`: Max solve time in seconds (default: 7200) `mip_gap`: Optimality gap tolerance (default: 0.05) `init_method`: MIP start method ('greedy', 'hungarian', or None)	`None`
`ignore_precomputed_triangulation`	`bool`	If True, compute fresh Delaunay even if precomputed one provided.	`False`

Returns:

Name	Type	Description
`matches_df`	`DataFrame`	Matched pairs with columns: `aligned_idx`, `ref_idx`: Row indices into filtered DataFrames `X`, `Y`: Coordinates of aligned points `ref_X`, `ref_Y`: Coordinates of matched reference points `size`, `ref_size`: Metacell sizes (1 for individual cells) `Ref_{cell_id_col}`, `Aligned_{cell_id_col}`: Original cell IDs `time_limit_reached`: Whether solver hit time limit `triangle_violation`: Whether point is in a flipped triangle `filtered_violation`: Violation detected by both methods `run_time`: Optimization time in seconds
`var_out`	`dict`	Optimization diagnostics including: `x`: Match variable values (list of 0/1) `no_match_vars`: Unmatched penalty values `penalty_vars`: Multi-match penalty values `area_penalty_vars`: Triangle flip penalty values `violations`: Spatial violation report `triangle_data`: Triangle analysis (areas, flips) `lazy_cuts_added`: Number of lazy constraints added

Notes

Objective Function:

The solver minimizes:

.. math::

\sum_{(i,j)} c_{ij} x_{ij} + \alpha \sum_j p_j + \beta \sum_i s_i n_i + \gamma \sum_t w_t q_t

where: - :math:c_{ij} = cell type distance + coordinate distance - :math:p_j = penalty for ref j matched more than once - :math:n_i = penalty for aligned i not matched, weighted by size :math:s_i - :math:q_t = penalty for triangle t orientation flip, weighted by :math:w_t

Lazy Constraints:

When lazy_constraints=True, triangle orientation constraints are added on-demand during optimization. This reduces memory from O(n×k³) to O(n) and is recommended for large problems (>1000 cells).

Space-Tearing:

Triangle orientation flips indicate space-tearing events where the spatial relationship between cells changes (e.g., due to tissue deformation or missing cells). The delaunay_penalty controls how strongly these are penalized.

Examples:

Basic usage:

>>> from src import run_same
>>> matches, var_out = run_same(
...     ref_df=ref_df,
...     aligned_df=aligned_df,
...     commonCT=['TypeA', 'TypeB', 'TypeC'],
...     outprefix='results/'
... )
>>> print(f"Found {len(matches)} matches")

With custom parameters:

>>> from src import run_same, init_optim_params, init_gurobi_params
>>> optim = init_optim_params(radius=500, lazy_constraints=True)
>>> gurobi = init_gurobi_params(time_limit=3600, mip_gap=0.01)
>>> matches, var_out = run_same(
...     ref_df=ref_df,
...     aligned_df=aligned_df,
...     commonCT=commonCT,
...     optim_params=optim,
...     gurobi_params=gurobi
... )

See Also

sliding_window_matching : Process large datasets in windows. init_optim_params : Create optimization parameters. init_gurobi_params : Create Gurobi parameters. greedy_triangle_collapse : Create metacells for faster optimization.

`src.same.sliding_window_matching(ref, moving, commonCT=None, outprefix=None, moving_delaunay=None, moving_delaunay_vertex_col=None, optim_params=None, gurobi_params=None, ignore_precomputed_triangulation=False)`

Match cells between reference and moving datasets using a sliding window approach.

This function divides the spatial domain into overlapping windows and runs the SAME optimization on each window. Results from overlapping regions are resolved using bipartite matching to ensure unique assignments.

Supports resumption from partial results when outprefix is provided.

Parameters:

Name	Type	Description	Default
`ref`	`DataFrame or MetaCell`	Reference dataset with columns ['X', 'Y', 'cell_type'] plus cell type probability columns. Can also be a MetaCell object with `metacell_df` attribute.	required
`moving`	`DataFrame or MetaCell`	Moving/aligned dataset to be matched to reference. Same column requirements as `ref`. Can be a MetaCell object with `metacell_df` and `metacell_delaunay` attributes.	required
`commonCT`	`list of str`	List of cell type column names (probability/one-hot columns). If None, inferred from unique values of 'cell_type' column.	`None`
`outprefix`	`str`	Output directory path. If provided, intermediate results are saved and the function can resume from partial runs.	`None`
`moving_delaunay`	`array - like`	Precomputed Delaunay triangulation for moving data as (n, 3) array of vertex indices. If moving is a MetaCell, this is extracted automatically.	`None`
`moving_delaunay_vertex_col`	`str`	Column name in moving DataFrame containing vertex IDs that correspond to indices in `moving_delaunay`. Required if triangulation uses non-sequential IDs.	`None`
`optim_params`	`dict`	Optimization parameters. See `init_optim_params()` for defaults. Key parameters include: `window_size`: Size of each window (default: 1000) `overlap`: Overlap between windows (default: 250) `radius`: KNN search radius (default: 250) `knn`: Number of nearest neighbors (default: 8) `lazy_constraints`: Use memory-efficient constraints (default: True)	`None`
`gurobi_params`	`dict`	Gurobi solver parameters. See `init_gurobi_params()` for defaults.	`None`
`ignore_precomputed_triangulation`	`bool`	If True, compute fresh Delaunay triangulation even if precomputed one is available.	`False`

Returns:

Type Description

DataFrame

Matched pairs with columns:

aligned_idx, ref_idx: Integer indices into input DataFrames
X, Y: Coordinates of aligned points
ref_X, ref_Y: Coordinates of matched reference points
Cell type probability columns
size, ref_size: Metacell sizes (if applicable)
window_id: Which window the match came from
time_limit_reached: Whether solver hit time limit
triangle_violation: Whether point is in a flipped triangle
run_time: Optimization time for the window

Examples:

Basic usage:

>>> from src import sliding_window_matching, init_optim_params
>>> matches = sliding_window_matching(
...     ref=ref_df,
...     moving=aligned_df,
...     commonCT=['TypeA', 'TypeB', 'TypeC'],
...     outprefix='results/'
... )

With custom parameters:

>>> optim = init_optim_params(window_size=500, overlap=100, radius=300)
>>> matches = sliding_window_matching(
...     ref=ref_df,
...     moving=aligned_df,
...     optim_params=optim
... )

With metacells:

>>> from src import MetaCell
>>> mc_aligned = MetaCell(aligned_df, max_metacell_size=10)
>>> matches = sliding_window_matching(ref=ref_df, moving=mc_aligned)

See Also

run_same : Core optimization for a single region. init_optim_params : Create optimization parameter dictionary. init_gurobi_params : Create Gurobi parameter dictionary. merge_window_matches_unique_ref : Merge overlapping window results.

`src.same.init_optim_params(**overrides)`

Create default optimization parameters for SAME matching.

Returns a dictionary of parameters controlling the matching problem formulation, spatial constraints, and sliding window behavior. Use keyword arguments to override specific defaults.

Parameters:

Name	Type	Description	Default
`**overrides`	`dict`	Keyword arguments to override default values.	`{}`

Returns:

Type Description

Dict[str, Any]

Dictionary of optimization parameters with the following keys:

Sliding window parameters:

window_size : int, default=1000 Size of each window in coordinate units.
overlap : int, default=250 Overlap between adjacent windows.
min_cells_per_window : int, default=10 Minimum cells required to process a window.

Matching problem parameters:

max_matches : int, default=1 Maximum times a reference point can be matched.
ref_metacell_match_multiplier : int or None, default=None For metacells, multiplier for max_matches. None = use max size.
radius : float, default=250 Search radius for finding KNN candidates.
knn : int, default=8 Number of nearest neighbors to consider.

Objective function coefficients:

penalty_coeff : float, default=100 Penalty for reference point matched more than once.
no_match_penalty : float, default=100 Penalty for unmatched aligned point (per cell).
delaunay_penalty : float, default=5 Penalty for triangle orientation flip (space-tearing).
dist_ct_coeff : float, default=1 Weight for cell type distance in objective.

Output labeling:

cell_id_col : str, default='Cell_Num_Old' Column name for cell identifiers.

Constraint/behavior flags:

hard_spatial_constraints : bool, default=False If True, strictly forbid triangle flips (no soft penalty).
ignore_same_type_triangles : bool, default=True If True, skip constraints for triangles with homogeneous cell types.
ignore_knn_if_matched : bool, default=False If True, use cell-type priority in KNN search.
lazy_constraints : bool, default=True If True, add spatial constraints lazily (memory efficient).

Triangle quality filtering:

min_angle_deg : float, default=15 Filter out thin triangles with minimum angle below this threshold.

Examples:

>>> params = init_optim_params(radius=500, knn=10, lazy_constraints=True)
>>> params['radius']
500

See Also

init_gurobi_params : Create Gurobi solver parameters. run_same : Main optimization function using these parameters. sliding_window_matching : Sliding window approach for large datasets.

`src.same.init_gurobi_params(**overrides)`

Create default Gurobi solver parameters for SAME optimization.

Returns a dictionary of Gurobi-related parameters that control solver behavior, time limits, and lazy constraint generation. Use keyword arguments to override specific defaults.

Parameters:

Name	Type	Description	Default
`**overrides`	`dict`	Keyword arguments to override default values.	`{}`

Returns:

Type Description

Dict[str, Any]

Dictionary of Gurobi parameters with the following keys:

Core solve controls:

time_limit : int, default=7200 Maximum solve time in seconds (2 hours).
mip_gap : float, default=0.05 MIP optimality gap tolerance (5%).

Gurobi tuning knobs:

mip_focus : int, default=2 MIPFocus parameter (2 = focus on proving optimality).
cuts : int, default=2 Cut generation aggressiveness (0=none to 3=aggressive).
heuristics : float, default=0.1 Fraction of time spent on heuristics (0.0-1.0).

MIP start / initialization:

init_method : str or None, default=None Method for warm-starting the solver:
- None: no initialization
- 'greedy': fast greedy assignment
- 'hungarian': optimal assignment (requires max_matches=1)
init_big_m : float, default=1e9 Large cost for forbidden pairs in Hungarian init.
init_hungarian_max_n : int, default=5000 Skip Hungarian init if n_aligned + n_ref exceeds this.

Lazy constraints:

lazy_max_cuts : int or None, default=None Global cap on lazy cuts added (None = unlimited).
lazy_allowed_flip_fraction : float, default=0.05 Allowed fraction of flipped triangles before adding cuts.
lazy_max_cuts_per_incumbent : int, default=1000 Max cuts added per incumbent solution.

Examples:

>>> params = init_gurobi_params(time_limit=3600, mip_gap=0.01)
>>> params['time_limit']
3600

See Also

init_optim_params : Create optimization parameters. run_same : Main optimization function using these parameters.