Quick Start
This guide will get you running SAME in under 5 minutes.
Input Data Format
SAME requires two DataFrames with the following columns:
| Column | Required | Description |
|---|---|---|
X, Y |
Yes | Spatial coordinates |
cell_type |
Yes | Cell type annotation |
Cell_Num_Old |
Yes | Unique cell identifier |
| Cell type columns | Yes | Probability or one-hot columns (e.g., TypeA, TypeB) |
Example data structure:
import pandas as pd
# Reference data (fixed)
ref_df = pd.DataFrame({
'Cell_Num_Old': [0, 1, 2, 3],
'X': [100, 150, 200, 250],
'Y': [100, 120, 100, 130],
'cell_type': ['TypeA', 'TypeB', 'TypeA', 'TypeB'],
'TypeA': [0.9, 0.1, 0.85, 0.15], # Probability columns
'TypeB': [0.1, 0.9, 0.15, 0.85],
})
# Aligned/moving data (to be matched)
aligned_df = pd.DataFrame({
'Cell_Num_Old': [0, 1, 2, 3, 4],
'X': [105, 155, 205, 260, 180],
'Y': [95, 125, 105, 125, 110],
'cell_type': ['TypeA', 'TypeB', 'TypeA', 'TypeB', 'TypeA'],
'TypeA': [0.88, 0.12, 0.9, 0.2, 0.82],
'TypeB': [0.12, 0.88, 0.1, 0.8, 0.18],
})
Basic Usage
Simple Matching
from src import run_same
# Run SAME optimization
matches, var_out = run_same(
ref_df=ref_df,
aligned_df=aligned_df,
commonCT=['TypeA', 'TypeB'], # Cell type columns
outprefix='results/'
)
# View results
print(f"Found {len(matches)} matches")
print(matches[['aligned_idx', 'ref_idx', 'X', 'Y', 'ref_X', 'ref_Y']])
With Custom Parameters
from src import run_same, init_optim_params, init_gurobi_params
# Customize optimization parameters
optim = init_optim_params(
radius=300, # Search radius for KNN
knn=10, # Number of nearest neighbors
max_matches=1, # Each ref can be matched once
lazy_constraints=True # Memory-efficient mode
)
# Customize Gurobi parameters
gurobi = init_gurobi_params(
time_limit=3600, # 1 hour timeout
mip_gap=0.01 # 1% optimality gap
)
matches, var_out = run_same(
ref_df=ref_df,
aligned_df=aligned_df,
commonCT=['TypeA', 'TypeB'],
optim_params=optim,
gurobi_params=gurobi
)
Large Datasets (1000+ cells)
For larger datasets, use sliding windows:
from src import sliding_window_matching, init_optim_params
optim = init_optim_params(
window_size=1000,
overlap=250,
radius=250,
lazy_constraints=True
)
matches = sliding_window_matching(
ref=ref_df,
moving=aligned_df,
commonCT=['TypeA', 'TypeB'],
optim_params=optim,
outprefix='results/'
)
Very Large Datasets (10,000+ cells)
For very large datasets, use metacells to reduce problem size:
from src import greedy_triangle_collapse, run_same, unpack_metacell_matches
# Step 1: Create metacells
mc_aligned, tri = greedy_triangle_collapse(
aligned_df,
max_metacell_size=3, # Max cells per metacell
r_max=500, # Remove long edges
)
print(f"Reduced from {len(aligned_df)} to {len(mc_aligned)} metacells")
# Step 2: Run SAME on metacells
matches, var_out = run_same(
ref_df=ref_df,
aligned_df=mc_aligned,
commonCT=['TypeA', 'TypeB'],
cell_id_col='Cell_Num' # Metacells use Cell_Num
)
# Step 3: Unpack to individual cells
individual_matches = unpack_metacell_matches(
matches,
mc_aligned,
ref_df
)
Output Format
The matches DataFrame contains:
| Column | Description |
|---|---|
aligned_idx |
Row index in aligned_df |
ref_idx |
Row index in ref_df |
X, Y |
Coordinates of aligned point |
ref_X, ref_Y |
Coordinates of matched ref point |
Aligned_Cell_Num_Old |
Original aligned cell ID |
Ref_Cell_Num_Old |
Original ref cell ID |
triangle_violation |
Whether point is in a flipped triangle |
run_time |
Optimization time (seconds) |
The var_out dictionary contains optimization diagnostics.
Next Steps
- Algorithm Overview - Understand how SAME works
- Metacells - Learn about graph simplification
- Parameters - Full parameter reference
- Tutorials - Detailed examples