The Canny Edge Detector: Signal Theory and Implementation

The Canny edge detector (1986) remains one of the most elegant algorithms in computer vision. Its design criteria — good detection, good localization, single response — translate directly into mathematical constraints.

The Optimal Edge Filter

Canny formulated edge detection as an optimization problem. He derived that the optimal 1D filter maximizing the signal-to-noise ratio (SNR) and localization simultaneously is well-approximated by the first derivative of a Gaussian:

f(x) = -\frac{d}{dx} G_\sigma(x) = \frac{x}{\sigma^2} e^{-x^2 / 2\sigma^2}

The 2D extension computes the gradient magnitude and direction:

\|\nabla I\| = \sqrt{G_x^2 + G_y^2}, \quad \theta = \arctan\!\left(\frac{G_y}{G_x}\right)

where $G_x = \frac{\partial}{\partial x}(G_\sigma * I)$ and $G_y = \frac{\partial}{\partial y}(G_\sigma * I)$ .

Gaussian Scale and the Uncertainty Principle

The Gaussian standard deviation $\sigma$ controls the trade-off between noise robustness and localization precision. This is a direct consequence of the Heisenberg uncertainty principle applied to signals:

\sigma_x \cdot \sigma_\omega \geq \frac{1}{2}

A wider $\sigma$ in the spatial domain (more smoothing) means narrower bandwidth in frequency — better noise rejection but blurred edges. Typical values: $\sigma \in [1, 3]$ pixels.

Non-Maximum Suppression

After gradient computation, edges are thinned by keeping only local maxima in the gradient direction. For a pixel at $(x, y)$ with gradient direction $\theta$ :

\text{NMS}(x,y) = \begin{cases} \|\nabla I(x,y)\| & \text{if } \|\nabla I\| \geq \|\nabla I\|_{\pm\theta} \\ 0 & \text{otherwise} \end{cases}

Bilinear interpolation between discrete neighbors is used for sub-pixel accuracy.

Hysteresis Thresholding

Two thresholds $T_H > T_L$ create three pixel classes:

$\|\nabla I\| > T_H$ → strong edge (keep)
$T_L \leq \|\nabla I\| \leq T_H$ → weak edge (keep if connected to a strong edge)
$\|\nabla I\| < T_L$ → suppressed

Connectivity is verified via 8-connected BFS/DFS. A typical ratio is $T_H / T_L \approx 3$ .

Implementation

import cv2

def canny_detect(image_path: str, sigma: float = 1.4) -> np.ndarray:
    img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
    # Gaussian smoothing — kernel size must be odd
    ksize = int(6 * sigma + 1) | 1
    blurred = cv2.GaussianBlur(img, (ksize, ksize), sigma)
    # Gradient → Canny
    # Thresholds derived from Otsu's method on gradient magnitude
    sobelx = cv2.Sobel(blurred, cv2.CV_64F, 1, 0, ksize=3)
    sobely = cv2.Sobel(blurred, cv2.CV_64F, 0, 1, ksize=3)
    mag = np.hypot(sobelx, sobely)
    t_high = 0.2 * mag.max()
    t_low  = 0.5 * t_high
    edges = cv2.Canny(blurred, t_low, t_high, L2gradient=True)
    return edges

Note: L2gradient=True uses the exact $\ell_2$ norm instead of the $\ell_1$ approximation $|G_x| + |G_y|$ , improving angular accuracy at ~10% compute cost.

When to Use Canny vs. Alternatives

| Algorithm | Best For | |---|---| | Canny | General purpose, thin clean edges | | Sobel/Prewitt | Fast gradient estimation, not full edge detection | | Laplacian of Gaussian | Blob detection, scale-space analysis | | Structured Forests | Complex textures, learned features | | Segment Anything (SAM) | Semantic boundaries in natural images |

For industrial vision — where edges correspond to physical object boundaries and SNR is high — Canny remains the most reliable and interpretable choice.