The Stereo Baseline Problem: Why One Number Does Not Fit Every Shot
Stereo baseline is the single most important parameter in 2D-to-3D conversion, and most tools get it wrong by making it global instead of per-shot.
When you convert a 2D image to stereoscopic 3D, the most consequential decision is the stereo baseline — the simulated distance between the left and right virtual cameras. Too small, and the 3D effect is imperceptible. Too large, and viewers get eye strain within minutes. The correct value depends on the scene, not the project.
This is where most automated conversion tools fail. They apply a single baseline value to an entire video. A wide landscape shot and a close-up portrait receive the same inter-ocular distance, producing a result that feels either flat or painfully deep depending on which scene the parameter was tuned for. Professional stereographers have known this for decades: baseline must track with subject distance.
The fix isn't complicated in principle. You estimate the depth range of each shot, compute a baseline that maps that range to a comfortable disparity budget (typically 1-2% of screen width for the nearest object), and adjust per shot or even per scene. In practice, this requires reliable per-frame depth estimation — which is exactly what modern AI models finally provide. The pipeline becomes: estimate depth, compute optimal baseline per shot, synthesize stereo with that baseline, verify disparity budget, adjust and re-render if out of spec.
What makes this hard to do manually is the sheer number of decisions. A 90-minute film has hundreds of shots. Each needs its own baseline calculation, its own convergence point, its own comfort check. This is precisely the kind of repetitive-but-consequential work that benefits from automation with human oversight — not full automation, and not full manual control, but a pipeline that proposes parameters and lets you override the ones that matter.
The deeper insight is that stereo conversion is not a single operation. It is a chain of dependent decisions, each of which must be tuned to the content. Baseline is just the most visible one. Convergence plane selection, edge inpainting strategy, temporal smoothing strength — all of these vary by shot. A good conversion tool doesn't hide this complexity. It exposes it in a way that lets you intervene where it matters and trust the defaults everywhere else.