Back to Blog
Signal Processing Mathematics OpenCV Image Processing

The Canny Edge Detector: Signal Theory and Implementation

Understanding Canny edge detection through the lens of signal processing, optimal filter design, and the mathematics of non-maximum suppression.

January 8, 20243 min read

The Canny edge detector (1986) remains one of the most elegant algorithms in computer vision. Its design criteria โ€” good detection, good localization, single response โ€” translate directly into mathematical constraints.

The Optimal Edge Filter

Canny formulated edge detection as an optimization problem. He derived that the optimal 1D filter maximizing the signal-to-noise ratio (SNR) and localization simultaneously is well-approximated by the first derivative of a Gaussian:

f(x)=โˆ’ddxGฯƒ(x)=xฯƒ2eโˆ’x2/2ฯƒ2f(x) = -\frac{d}{dx} G_\sigma(x) = \frac{x}{\sigma^2} e^{-x^2 / 2\sigma^2}

The 2D extension computes the gradient magnitude and direction:

โˆฅโˆ‡Iโˆฅ=Gx2+Gy2,ฮธ=arctanโกโ€‰โฃ(GyGx)\|\nabla I\| = \sqrt{G_x^2 + G_y^2}, \quad \theta = \arctan\!\left(\frac{G_y}{G_x}\right)

where Gx=โˆ‚โˆ‚x(Gฯƒโˆ—I)G_x = \frac{\partial}{\partial x}(G_\sigma * I) and Gy=โˆ‚โˆ‚y(Gฯƒโˆ—I)G_y = \frac{\partial}{\partial y}(G_\sigma * I).

Gaussian Scale and the Uncertainty Principle

The Gaussian standard deviation ฯƒ\sigma controls the trade-off between noise robustness and localization precision. This is a direct consequence of the Heisenberg uncertainty principle applied to signals:

ฯƒxโ‹…ฯƒฯ‰โ‰ฅ12\sigma_x \cdot \sigma_\omega \geq \frac{1}{2}

A wider ฯƒ\sigma in the spatial domain (more smoothing) means narrower bandwidth in frequency โ€” better noise rejection but blurred edges. Typical values: ฯƒโˆˆ[1,3]\sigma \in [1, 3] pixels.

Non-Maximum Suppression

After gradient computation, edges are thinned by keeping only local maxima in the gradient direction. For a pixel at (x,y)(x, y) with gradient direction ฮธ\theta:

NMS(x,y)={โˆฅโˆ‡I(x,y)โˆฅifย โˆฅโˆ‡Iโˆฅโ‰ฅโˆฅโˆ‡Iโˆฅยฑฮธ0otherwise\text{NMS}(x,y) = \begin{cases} \|\nabla I(x,y)\| & \text{if } \|\nabla I\| \geq \|\nabla I\|_{\pm\theta} \\ 0 & \text{otherwise} \end{cases}

Bilinear interpolation between discrete neighbors is used for sub-pixel accuracy.

Hysteresis Thresholding

Two thresholds TH>TLT_H > T_L create three pixel classes:

  • โˆฅโˆ‡Iโˆฅ>TH\|\nabla I\| > T_H โ†’ strong edge (keep)
  • TLโ‰คโˆฅโˆ‡Iโˆฅโ‰คTHT_L \leq \|\nabla I\| \leq T_H โ†’ weak edge (keep if connected to a strong edge)
  • โˆฅโˆ‡Iโˆฅ<TL\|\nabla I\| < T_L โ†’ suppressed

Connectivity is verified via 8-connected BFS/DFS. A typical ratio is TH/TLโ‰ˆ3T_H / T_L \approx 3.

Implementation

import cv2

def canny_detect(image_path: str, sigma: float = 1.4) -> np.ndarray:
    img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
    # Gaussian smoothing โ€” kernel size must be odd
    ksize = int(6 * sigma + 1) | 1
    blurred = cv2.GaussianBlur(img, (ksize, ksize), sigma)
    # Gradient โ†’ Canny
    # Thresholds derived from Otsu's method on gradient magnitude
    sobelx = cv2.Sobel(blurred, cv2.CV_64F, 1, 0, ksize=3)
    sobely = cv2.Sobel(blurred, cv2.CV_64F, 0, 1, ksize=3)
    mag = np.hypot(sobelx, sobely)
    t_high = 0.2 * mag.max()
    t_low  = 0.5 * t_high
    edges = cv2.Canny(blurred, t_low, t_high, L2gradient=True)
    return edges

Note: L2gradient=True uses the exact โ„“2\ell_2 norm instead of the โ„“1\ell_1 approximation โˆฃGxโˆฃ+โˆฃGyโˆฃ|G_x| + |G_y|, improving angular accuracy at ~10% compute cost.

When to Use Canny vs. Alternatives

| Algorithm | Best For | |---|---| | Canny | General purpose, thin clean edges | | Sobel/Prewitt | Fast gradient estimation, not full edge detection | | Laplacian of Gaussian | Blob detection, scale-space analysis | | Structured Forests | Complex textures, learned features | | Segment Anything (SAM) | Semantic boundaries in natural images |

For industrial vision โ€” where edges correspond to physical object boundaries and SNR is high โ€” Canny remains the most reliable and interpretable choice.