← Back to projects

Computer-vision pipeline for exam grading

Backend / Computer Vision

Problem

Process large volumes of answer sheets accurately, without saturating the server CPU or requiring manual per-sheet calibration.

Impact

Stable throughput and the elimination of CPU spikes, with detection that self-adjusts without per-document configuration.

The problem

The service graded answer sheets via optical mark recognition (OMR). As it scaled, two serious problems appeared: processing saturated the CPU at 96%, and detection relied on calibrating each sheet by hand, which did not scale.

The architecture

  • FastAPI service with explicit endpoint versioning, to evolve the contract without breaking consumers.
  • OpenCV pipeline with zone-based adaptive thresholding: detection adapts itself to each region of the sheet, with no manual configuration.
  • Worker pool to process PDFs in parallel.
  • Hardened Docker containers (non-root user) to run safely in production.

Key decisions

  • The 96% CPU root cause was oversubscription: a process pool combined with OpenCV’s numerical libraries (BLAS/TBB/OpenMP) multiplied the thread count. The fix was to explicitly cap those libraries’ thread pools. N processes × M threads stopped fighting over the same cores.
  • Adaptive detection over manual calibration: removed repetitive per-sheet work and made the system robust to variation.
  • Non-root container hardening: required aligning volume permissions and a fixed UID, but closed a real attack surface.

Impact

CPU usage stopped spiking, throughput became predictable, and grading no longer depended on manual per-document tuning.