> ## Documentation Index
> Fetch the complete documentation index at: https://mintlify.com/OverCV/UC-Intel-Final/llms.txt
> Use this file to discover all available pages before exploring further.

# Visualization API

> Dataset visualization utilities for ML training dashboards

## Overview

The visualization API provides utilities for rendering interactive charts, sample grids, and dataset previews using Plotly and Streamlit components.

## Class Distribution Visualization

### render\_class\_distribution\_chart(dataset\_info)

Render an interactive Plotly bar chart showing sample distribution across classes.

<ParamField path="dataset_info" type="dict" required>
  Dataset information dictionary containing train\_samples and val\_samples
</ParamField>

```python theme={null}
from utils.dataset_viz import render_class_distribution_chart
from state.cache import get_dataset_info

dataset_info = get_dataset_info()
if dataset_info:
    render_class_distribution_chart(dataset_info)
```

**Expected dataset\_info Structure:**

```python theme={null}
dataset_info = {
    "train_samples": {
        "malware_family_1": 450,
        "malware_family_2": 380,
        "malware_family_3": 290
    },
    "val_samples": {
        "malware_family_1": 90,
        "malware_family_2": 75,
        "malware_family_3": 60
    }
}
```

**Chart Features:**

* Grouped bar chart (Training vs Validation)
* Custom colors (#98c127 for training, #8fd7d7 for validation)
* 45-degree rotated x-axis labels
* Dark theme with transparent background
* Auto-scales to container width

**File reference:** `app/utils/dataset_viz.py:12`

## Class Summary

### render\_class\_summary(dataset\_info)

Display top 5 and bottom 5 classes by sample count in a two-column layout.

<ParamField path="dataset_info" type="dict" required>
  Dataset information dictionary containing train\_samples
</ParamField>

```python theme={null}
from utils.dataset_viz import render_class_summary

render_class_summary(dataset_info)
```

**Output Layout:**

```
┌─────────────────────┬─────────────────────┐
│  Most Common        │  Least Common       │
├─────────────────────┼─────────────────────┤
│  class_a: 1,245     │  class_x: 45        │
│  class_b: 982       │  class_y: 67        │
│  class_c: 834       │  class_z: 89        │
│  class_d: 756       │  class_w: 102       │
│  class_e: 698       │  class_v: 134       │
└─────────────────────┴─────────────────────┘
```

**File reference:** `app/utils/dataset_viz.py:55`

## Sample Grid

### render\_sample\_grid(dataset\_info, selected\_class)

Display a grid of sample images with their dimensions.

<ParamField path="dataset_info" type="dict" required>
  Dataset information dictionary containing sample\_paths
</ParamField>

<ParamField path="selected_class" type="str" required>
  Class name to display samples from, or "All" for mixed samples
</ParamField>

```python theme={null}
from utils.dataset_viz import render_sample_grid
import streamlit as st

selected_class = st.selectbox(
    "Select Class",
    options=["All"] + dataset_info["classes"]
)

render_sample_grid(dataset_info, selected_class)
```

**Expected dataset\_info Structure:**

```python theme={null}
dataset_info = {
    "sample_paths": {
        "malware_family_1": [
            Path("/path/to/sample1.png"),
            Path("/path/to/sample2.png"),
            # ... up to 10 samples per class
        ],
        "malware_family_2": [...]
    },
    "classes": ["malware_family_1", "malware_family_2"]
}
```

**Grid Features:**

* 5-column layout
* Up to 10 samples displayed
* Image dimensions shown below each sample
* Random sampling when "All" is selected (2 per class, max 10 total)
* Error handling for missing/corrupt images

**File reference:** `app/utils/dataset_viz.py:73`

**Example Output:**

```
┌─────┬─────┬─────┬─────┬─────┐
│ img │ img │ img │ img │ img │
│224x │224x │224x │224x │224x │
│ 224 │ 224 │ 224 │ 224 │ 224 │
├─────┼─────┼─────┼─────┼─────┤
│ img │ img │ img │ img │ img │
│224x │224x │224x │224x │224x │
│ 224 │ 224 │ 224 │ 224 │ 224 │
└─────┴─────┴─────┴─────┴─────┘
```

## Data Split Visualization

### render\_split\_pie\_chart(train\_final, val\_final, test\_final)

Render a donut chart showing the distribution of data splits.

<ParamField path="train_final" type="int" required>
  Number of training samples
</ParamField>

<ParamField path="val_final" type="int" required>
  Number of validation samples
</ParamField>

<ParamField path="test_final" type="int" required>
  Number of test samples
</ParamField>

```python theme={null}
from utils.dataset_viz import render_split_pie_chart

train_count = 7000
val_count = 1500
test_count = 1500

render_split_pie_chart(train_count, val_count, test_count)
```

**Chart Features:**

* Donut chart with 30% hole
* Custom colors:
  * Training: #98c127 (green)
  * Validation: #8fd7d7 (cyan)
  * Test: #ffb255 (orange)
* Compact layout (300px height)
* Dark theme with transparent background
* Percentage labels automatically calculated

**File reference:** `app/utils/dataset_viz.py:104`

## Preprocessing Preview

### render\_preprocessing\_preview(sample\_path, target\_size, color\_mode)

Show before/after comparison of image preprocessing.

<ParamField path="sample_path" type="Path | str" required>
  Path to sample image file
</ParamField>

<ParamField path="target_size" type="str" required>
  Target size in format "WIDTHxHEIGHT" (e.g., "224x224")
</ParamField>

<ParamField path="color_mode" type="str" required>
  Color mode - "RGB" or "Grayscale"
</ParamField>

```python theme={null}
from utils.dataset_viz import render_preprocessing_preview
from pathlib import Path

sample_path = Path("/path/to/sample.png")

render_preprocessing_preview(
    sample_path=sample_path,
    target_size="224x224",
    color_mode="RGB"
)
```

**Visual Layout:**

```
┌─────────────────────┬─────────────────────┐
│  Original Image     │  After Processing   │
├─────────────────────┼─────────────────────┤
│  [Original Image]   │  [Processed Image]  │
│  Size: 512x512      │  Size: 224x224      │
│                     │  Mode: RGB          │
└─────────────────────┴─────────────────────┘
```

**Preprocessing Operations:**

1. Resize to target dimensions using LANCZOS resampling
2. Convert to grayscale if color\_mode is "Grayscale"
3. Display size and mode information

**File reference:** `app/utils/dataset_viz.py:126`

**Example with Grayscale:**

```python theme={null}
render_preprocessing_preview(
    sample_path="dataset/class_a/sample_001.png",
    target_size="128x128",
    color_mode="Grayscale"
)
```

## Plotly Configuration

All visualization functions use consistent Plotly theming:

```python theme={null}
# Common layout settings
layout = {
    "paper_bgcolor": "rgba(0,0,0,0)",  # Transparent background
    "plot_bgcolor": "rgba(0,0,0,0)",   # Transparent plot area
    "font": {"color": "#fafafa"},       # Light text color
    "height": 500,                      # Chart height
    "xaxis": {"tickangle": -45}         # Rotated x-axis labels
}
```

**Color Palette:**

* Training data: `#98c127` (Soft green)
* Validation data: `#8fd7d7` (Soft cyan)
* Test data: `#ffb255` (Soft orange)
* Accent colors: `#f45f74` (Soft pink), `#bdd373` (Light green)

## Image Processing

The visualization utilities use PIL (Pillow) for image processing:

```python theme={null}
from PIL import Image

# Resize with high-quality resampling
Image.Resampling.LANCZOS  # Used for all resize operations

# Color mode conversion
image.convert("L")  # Convert to grayscale
image.convert("RGB")  # Convert to RGB
```

## Error Handling

All visualization functions include error handling:

```python theme={null}
try:
    img = Image.open(img_path)
    st.image(img, width="stretch")
    st.caption(f"{img.size[0]}x{img.size[1]}")
except Exception as exception:
    st.error(f"Error: {img_path.name}. {exception}")
```

## Best Practices

**Performance Optimization:**

```python theme={null}
# Cache dataset info to avoid repeated scans
from state.cache import get_dataset_info, set_dataset_info

if not get_dataset_info():
    # Perform expensive scan
    dataset_info = scan_dataset(path)
    set_dataset_info(dataset_info)  # Cache result

# Use cached data
dataset_info = get_dataset_info()
render_class_distribution_chart(dataset_info)
```

**Responsive Layouts:**

```python theme={null}
import streamlit as st

# Use columns for side-by-side visualizations
col1, col2 = st.columns(2)

with col1:
    render_class_summary(dataset_info)

with col2:
    render_split_pie_chart(train, val, test)
```

**Conditional Rendering:**

```python theme={null}
if dataset_info and dataset_info.get("train_samples"):
    render_class_distribution_chart(dataset_info)
else:
    st.warning("No dataset information available")
```

## Integration Example

Complete example showing dataset visualization workflow:

```python theme={null}
import streamlit as st
from state.cache import get_dataset_info
from utils.dataset_viz import (
    render_class_distribution_chart,
    render_class_summary,
    render_sample_grid,
    render_split_pie_chart,
    render_preprocessing_preview
)

st.header("Dataset Overview")

# Get cached dataset info
dataset_info = get_dataset_info()

if dataset_info:
    # Distribution chart
    st.subheader("Class Distribution")
    render_class_distribution_chart(dataset_info)
    
    # Summary statistics
    col1, col2 = st.columns(2)
    with col1:
        render_class_summary(dataset_info)
    with col2:
        total_train = dataset_info["total_train"]
        total_val = dataset_info["total_val"]
        total_test = dataset_info.get("total_test", 0)
        render_split_pie_chart(total_train, total_val, total_test)
    
    # Sample grid
    st.subheader("Sample Images")
    selected_class = st.selectbox(
        "View samples from:",
        options=["All"] + dataset_info["classes"]
    )
    render_sample_grid(dataset_info, selected_class)
    
    # Preprocessing preview
    st.subheader("Preprocessing Preview")
    sample_paths = dataset_info["sample_paths"]
    if sample_paths:
        first_class = list(sample_paths.keys())[0]
        sample_path = sample_paths[first_class][0]
        render_preprocessing_preview(
            sample_path,
            target_size="224x224",
            color_mode="RGB"
        )
else:
    st.info("Configure your dataset to view visualizations")
```