conversion_project/LOGGING_STRUCTURE.md

# Structured Logging with Media Context

## Overview

The conversion system now uses **structured JSON logging** to enable organized analysis and filtering of conversion results by media type, show name, season, and episode.

## Terminal vs Log Output

- **Terminal Output**: Clean, human-readable print statements (VIDEO/AUDIO/PROGRESS sections)
- **Log Output**: Rich structured JSON with full media context for programmatic analysis

## Media Context Fields

Extracted automatically from file path structure:

```python
{
    "video_filename": "episode01.mkv",
    "media_type": "tv",        # "tv", "anime", "movie", or "other"
    "show_name": "Breaking Bad",
    "season": "01",            # Optional (TV/anime only)
    "episode": "01"            # Optional (TV/anime only)
}
```

## Usage Examples

### Path Structure Recognition

**TV Show**:
```
P:\tv\Breaking Bad\season01\episode01.mkv
→ media_type: "tv", show_name: "Breaking Bad", season: "01", episode: "01"
```

**Anime**:
```
P:\anime\Demon Slayer\season02\e12.mkv
→ media_type: "anime", show_name: "Demon Slayer", season: "02", episode: "12"
```

**Movie**:
```
P:\movies\Inception.mkv
→ media_type: "movie", show_name: "Inception"
```

## Log Output Format

JSON logs contain both the message and media context:

```json
{
    "timestamp": "2026-02-22 15:30:45",
    "level": "INFO",
    "message": "✅ CONVERSION COMPLETE: episode01[EHX].mkv",
    "video_filename": "episode01.mkv",
    "media_type": "tv",
    "show_name": "Breaking Bad",
    "season": "01",
    "episode": "01",
    "method": "CQ",
    "original_size_mb": 4096.5,
    "output_size_mb": 1843.2,
    "reduction_pct": 55.0
}
```

## Filtering Logs Later

You can parse the JSON logs to group by show/season/episode:

```python
import json

# Filter all Breaking Bad conversions
with open("logs/conversion.log") as f:
    for line in f:
        entry = json.loads(line)
        if entry.get("show_name") == "Breaking Bad":
            print(f"S{entry['season']}E{entry['episode']}: {entry['reduction_pct']}% reduction")
```

## Current Implementation

**Files Updated**:
- `core/process_manager.py`:
  - Added `get_media_context()` function to parse file paths
  - Extracts media context once per file processing
  - Passes context to all logging calls via `extra={}` parameter

- `core/logger_helper.py`:
  - JsonFormatter automatically includes all extra fields in output
  - Added `log_event()` helper for consistent structured logging

## Best Practices

1. Always call `get_media_context()` once per file
2. Pass result to all logging calls for that file: `logger.info(msg, extra=media_context)`
3. For additional context: `logger.info(msg, extra={**media_context, "custom_field": value})`
4. Parse logs with JSON reader for reliable data extraction