Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/OverCV/UC-Intel-Final/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The visualization API provides utilities for rendering interactive charts, sample grids, and dataset previews using Plotly and Streamlit components.

Class Distribution Visualization

render_class_distribution_chart(dataset_info)

Render an interactive Plotly bar chart showing sample distribution across classes.
dataset_info
dict
required
Dataset information dictionary containing train_samples and val_samples
from utils.dataset_viz import render_class_distribution_chart
from state.cache import get_dataset_info

dataset_info = get_dataset_info()
if dataset_info:
    render_class_distribution_chart(dataset_info)
Expected dataset_info Structure:
dataset_info = {
    "train_samples": {
        "malware_family_1": 450,
        "malware_family_2": 380,
        "malware_family_3": 290
    },
    "val_samples": {
        "malware_family_1": 90,
        "malware_family_2": 75,
        "malware_family_3": 60
    }
}
Chart Features:
  • Grouped bar chart (Training vs Validation)
  • Custom colors (#98c127 for training, #8fd7d7 for validation)
  • 45-degree rotated x-axis labels
  • Dark theme with transparent background
  • Auto-scales to container width
File reference: app/utils/dataset_viz.py:12

Class Summary

render_class_summary(dataset_info)

Display top 5 and bottom 5 classes by sample count in a two-column layout.
dataset_info
dict
required
Dataset information dictionary containing train_samples
from utils.dataset_viz import render_class_summary

render_class_summary(dataset_info)
Output Layout:
┌─────────────────────┬─────────────────────┐
│  Most Common        │  Least Common       │
├─────────────────────┼─────────────────────┤
│  class_a: 1,245     │  class_x: 45        │
│  class_b: 982       │  class_y: 67        │
│  class_c: 834       │  class_z: 89        │
│  class_d: 756       │  class_w: 102       │
│  class_e: 698       │  class_v: 134       │
└─────────────────────┴─────────────────────┘
File reference: app/utils/dataset_viz.py:55

Sample Grid

render_sample_grid(dataset_info, selected_class)

Display a grid of sample images with their dimensions.
dataset_info
dict
required
Dataset information dictionary containing sample_paths
selected_class
str
required
Class name to display samples from, or “All” for mixed samples
from utils.dataset_viz import render_sample_grid
import streamlit as st

selected_class = st.selectbox(
    "Select Class",
    options=["All"] + dataset_info["classes"]
)

render_sample_grid(dataset_info, selected_class)
Expected dataset_info Structure:
dataset_info = {
    "sample_paths": {
        "malware_family_1": [
            Path("/path/to/sample1.png"),
            Path("/path/to/sample2.png"),
            # ... up to 10 samples per class
        ],
        "malware_family_2": [...]
    },
    "classes": ["malware_family_1", "malware_family_2"]
}
Grid Features:
  • 5-column layout
  • Up to 10 samples displayed
  • Image dimensions shown below each sample
  • Random sampling when “All” is selected (2 per class, max 10 total)
  • Error handling for missing/corrupt images
File reference: app/utils/dataset_viz.py:73 Example Output:
┌─────┬─────┬─────┬─────┬─────┐
│ img │ img │ img │ img │ img │
│224x │224x │224x │224x │224x │
│ 224 │ 224 │ 224 │ 224 │ 224 │
├─────┼─────┼─────┼─────┼─────┤
│ img │ img │ img │ img │ img │
│224x │224x │224x │224x │224x │
│ 224 │ 224 │ 224 │ 224 │ 224 │
└─────┴─────┴─────┴─────┴─────┘

Data Split Visualization

render_split_pie_chart(train_final, val_final, test_final)

Render a donut chart showing the distribution of data splits.
train_final
int
required
Number of training samples
val_final
int
required
Number of validation samples
test_final
int
required
Number of test samples
from utils.dataset_viz import render_split_pie_chart

train_count = 7000
val_count = 1500
test_count = 1500

render_split_pie_chart(train_count, val_count, test_count)
Chart Features:
  • Donut chart with 30% hole
  • Custom colors:
    • Training: #98c127 (green)
    • Validation: #8fd7d7 (cyan)
    • Test: #ffb255 (orange)
  • Compact layout (300px height)
  • Dark theme with transparent background
  • Percentage labels automatically calculated
File reference: app/utils/dataset_viz.py:104

Preprocessing Preview

render_preprocessing_preview(sample_path, target_size, color_mode)

Show before/after comparison of image preprocessing.
sample_path
Path | str
required
Path to sample image file
target_size
str
required
Target size in format “WIDTHxHEIGHT” (e.g., “224x224”)
color_mode
str
required
Color mode - “RGB” or “Grayscale”
from utils.dataset_viz import render_preprocessing_preview
from pathlib import Path

sample_path = Path("/path/to/sample.png")

render_preprocessing_preview(
    sample_path=sample_path,
    target_size="224x224",
    color_mode="RGB"
)
Visual Layout:
┌─────────────────────┬─────────────────────┐
│  Original Image     │  After Processing   │
├─────────────────────┼─────────────────────┤
│  [Original Image]   │  [Processed Image]  │
│  Size: 512x512      │  Size: 224x224      │
│                     │  Mode: RGB          │
└─────────────────────┴─────────────────────┘
Preprocessing Operations:
  1. Resize to target dimensions using LANCZOS resampling
  2. Convert to grayscale if color_mode is “Grayscale”
  3. Display size and mode information
File reference: app/utils/dataset_viz.py:126 Example with Grayscale:
render_preprocessing_preview(
    sample_path="dataset/class_a/sample_001.png",
    target_size="128x128",
    color_mode="Grayscale"
)

Plotly Configuration

All visualization functions use consistent Plotly theming:
# Common layout settings
layout = {
    "paper_bgcolor": "rgba(0,0,0,0)",  # Transparent background
    "plot_bgcolor": "rgba(0,0,0,0)",   # Transparent plot area
    "font": {"color": "#fafafa"},       # Light text color
    "height": 500,                      # Chart height
    "xaxis": {"tickangle": -45}         # Rotated x-axis labels
}
Color Palette:
  • Training data: #98c127 (Soft green)
  • Validation data: #8fd7d7 (Soft cyan)
  • Test data: #ffb255 (Soft orange)
  • Accent colors: #f45f74 (Soft pink), #bdd373 (Light green)

Image Processing

The visualization utilities use PIL (Pillow) for image processing:
from PIL import Image

# Resize with high-quality resampling
Image.Resampling.LANCZOS  # Used for all resize operations

# Color mode conversion
image.convert("L")  # Convert to grayscale
image.convert("RGB")  # Convert to RGB

Error Handling

All visualization functions include error handling:
try:
    img = Image.open(img_path)
    st.image(img, width="stretch")
    st.caption(f"{img.size[0]}x{img.size[1]}")
except Exception as exception:
    st.error(f"Error: {img_path.name}. {exception}")

Best Practices

Performance Optimization:
# Cache dataset info to avoid repeated scans
from state.cache import get_dataset_info, set_dataset_info

if not get_dataset_info():
    # Perform expensive scan
    dataset_info = scan_dataset(path)
    set_dataset_info(dataset_info)  # Cache result

# Use cached data
dataset_info = get_dataset_info()
render_class_distribution_chart(dataset_info)
Responsive Layouts:
import streamlit as st

# Use columns for side-by-side visualizations
col1, col2 = st.columns(2)

with col1:
    render_class_summary(dataset_info)

with col2:
    render_split_pie_chart(train, val, test)
Conditional Rendering:
if dataset_info and dataset_info.get("train_samples"):
    render_class_distribution_chart(dataset_info)
else:
    st.warning("No dataset information available")

Integration Example

Complete example showing dataset visualization workflow:
import streamlit as st
from state.cache import get_dataset_info
from utils.dataset_viz import (
    render_class_distribution_chart,
    render_class_summary,
    render_sample_grid,
    render_split_pie_chart,
    render_preprocessing_preview
)

st.header("Dataset Overview")

# Get cached dataset info
dataset_info = get_dataset_info()

if dataset_info:
    # Distribution chart
    st.subheader("Class Distribution")
    render_class_distribution_chart(dataset_info)
    
    # Summary statistics
    col1, col2 = st.columns(2)
    with col1:
        render_class_summary(dataset_info)
    with col2:
        total_train = dataset_info["total_train"]
        total_val = dataset_info["total_val"]
        total_test = dataset_info.get("total_test", 0)
        render_split_pie_chart(total_train, total_val, total_test)
    
    # Sample grid
    st.subheader("Sample Images")
    selected_class = st.selectbox(
        "View samples from:",
        options=["All"] + dataset_info["classes"]
    )
    render_sample_grid(dataset_info, selected_class)
    
    # Preprocessing preview
    st.subheader("Preprocessing Preview")
    sample_paths = dataset_info["sample_paths"]
    if sample_paths:
        first_class = list(sample_paths.keys())[0]
        sample_path = sample_paths[first_class][0]
        render_preprocessing_preview(
            sample_path,
            target_size="224x224",
            color_mode="RGB"
        )
else:
    st.info("Configure your dataset to view visualizations")