video_quiz_routes.py
Class: VideoQuizRoutesModule
Purpose: Handles kids video discovery, final question retrieval, answer grading (RapidFuzz + AI fallback), audio transcription via Whisper, and shared frontend configuration.
Fields:
- router_video_quiz: APIRouter — Router for kids-facing video endpoints (public discovery routes).
- router_api: APIRouter — Router for shared API endpoints (grading, transcription, config).
- DOWNLOADS_DIR: Path — Root directory for downloaded video assets (module invariant: all video data is read from here).
- BASE_DIR: Path — Base project directory (used to write cached JSON).
- GRADING_CONFIG: Dict[str, Any] — Centralized grading thresholds and AI configuration.
- NUM_WORDS: Dict[str, int] — Maps number words to integers (used for numeric extraction).
- SCALE_WORDS: Dict[str, int] — Numeric scale words (hundred, thousand, etc.).
- STOPWORDS: Set[str] — Common words removed during normalization.
- FILLER_WORDS: Set[str] — Speech filler words removed during normalization.
- SYNONYMS: Dict[str, str] — Canonical synonym mappings to normalize meaning.
- OPENAI_CLIENT: OpenAI client — Used for AI grading fallback.
Invariants:
- All video metadata is resolved from
DOWNLOADS_DIR/<video_id>. - Grading logic follows: Numeric → RapidFuzz → AI fallback → Default.
- AI grading only triggers for borderline similarity scores.
Duration Helpers
- _parse_duration_to_seconds(val: Any): Optional[int] — Converts numeric or time-string duration to seconds.
Preconditions:
valmust be int, float, or time-formatted string. Postconditions:- Returns non-negative integer seconds or None. Throws:
- No explicit exceptions (invalid input returns None). Example:
- _parse_duration_to_seconds("01:30") → 90
- _format_mmss(sec: Optional[int]): str — Formats seconds into MM:SS.
Preconditions:
secis None or non-negative integer. Postconditions:- Returns formatted string. Example:
- _format_mmss(125) → "02:05"
Video Discovery
- refresh_kids_videos_json(): List[Dict[str, Any]] — Rebuilds static
kids_videos.jsoncache from downloads directory. Preconditions:DOWNLOADS_DIRexists. Postconditions:- Scans video folders.
- Extracts title, duration, thumbnail.
- Writes
/static/kids_videos.json. Throws: - File write errors if disk unavailable. Example:
- refresh_kids_videos_json()
- list_kids_videos(): Dict[str, Any] — Returns JSON list of locally available kids videos.
Preconditions:
- None. Postconditions:
- Returns
{success, count, videos}. Example: - GET
/api/kids_videos
Final Question Retrieval
- get_final_questions(video_id: str): Dict[str, Any] — Retrieves best-ranked non-trashed question per segment.
Preconditions:
final_questions.jsonexists forvideo_id. Postconditions:- Selects lowest
llm_ranking. - Skips trashed questions.
- Excludes final segment from results. Throws:
- Returns error JSON if file missing. Example:
- GET
/api/final-questions/{video_id}
Answer Grading Helpers
- words_to_numbers(text: str): List[int] — Extracts numeric digits and word-based numbers.
Preconditions:
textis string. Postconditions:- Returns list of detected numbers. Example:
- words_to_numbers("three dogs and 2 cats") → [3, 2]
- normalize_text(text: str): str — Lowercases, removes stopwords/fillers, applies synonym mapping.
Preconditions:
textis string. Postconditions:- Returns cleaned canonical string. Example:
- normalize_text("The puppy is happy") → "dog happy"
- prepare_text_for_scoring(text: str): str — Cached normalized text with numeric hints appended.
Preconditions:
textis string. Postconditions:- Returns optimized scoring string. Throws:
- None. Example:
- prepare_text_for_scoring("three dogs") → "dog 3"
- keyword_overlap(expected: str, user: str): float — Computes keyword overlap ratio.
Preconditions:
- Pre-normalized strings. Postconditions:
- Returns similarity ratio [0,1].
- simplify_item(item: str): str — Simplifies list item to core keywords.
Preconditions:
itemis string. Postconditions:- Returns simplified representation.
- extract_items(expected_raw: str): List[str] — Splits comma/and-separated expected answers.
Preconditions:
expected_rawcontains potential list. Postconditions:- Returns simplified item list.
- list_match(expected_raw: str, user_raw: str): Tuple[int, int, List[str]] — Matches expected list items against user answer.
Preconditions:
- Inputs are raw strings. Postconditions:
- Returns (matched_count, total_items, matched_items).
- required_items_from_question(question: str, expected: str): int — Determines how many items must be matched.
Preconditions:
- Question text may contain numeric words. Postconditions:
- Returns required match count.
Answer Grading Endpoint
- check_answer(payload: Dict[str, Any]): Dict[str, Any] — Grades user answer using numeric checks, RapidFuzz similarity, and optional AI fallback.
Preconditions:
- Payload must contain:
- expected
- user
- question Postconditions:
- Returns:
- similarity score
- numeric flag
- status: correct | almost | wrong
- reason Throws:
- Returns structured JSON for invalid inputs. Example:
- POST
/api/check_answer
- Payload must contain:
Grading Flow:
- Numeric validation (exact or mismatch).
- RapidFuzz similarity scoring.
- Multi-item partial match logic.
- AI evaluation for borderline cases.
- Default fallback.
Audio Transcription
- transcribe_audio(file: UploadFile): Dict[str, Any] — Transcribes audio using Whisper model.
Preconditions:
- Valid audio file uploaded. Postconditions:
- Returns
{success, text}or{success, error}. Throws: - Returns structured error JSON on failure. Example:
- POST
/api/transcribe
Frontend Configuration
- get_config(): Dict[str, Any] — Returns frontend configuration settings.
Preconditions:
- None. Postconditions:
- Returns:
- skip_prevention flag
- grading thresholds from
GRADING_CONFIGExample:
- GET
/api/config