Documentation Index
Fetch the complete documentation index at: https://mintlify.com/OverCV/UC-Intel-Final/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The visualization API provides utilities for rendering interactive charts, sample grids, and dataset previews using Plotly and Streamlit components.
Class Distribution Visualization
render_class_distribution_chart(dataset_info)
Render an interactive Plotly bar chart showing sample distribution across classes.
Dataset information dictionary containing train_samples and val_samples
from utils.dataset_viz import render_class_distribution_chart
from state.cache import get_dataset_info
dataset_info = get_dataset_info()
if dataset_info:
render_class_distribution_chart(dataset_info)
Expected dataset_info Structure:
dataset_info = {
"train_samples": {
"malware_family_1": 450,
"malware_family_2": 380,
"malware_family_3": 290
},
"val_samples": {
"malware_family_1": 90,
"malware_family_2": 75,
"malware_family_3": 60
}
}
Chart Features:
- Grouped bar chart (Training vs Validation)
- Custom colors (#98c127 for training, #8fd7d7 for validation)
- 45-degree rotated x-axis labels
- Dark theme with transparent background
- Auto-scales to container width
File reference: app/utils/dataset_viz.py:12
Class Summary
render_class_summary(dataset_info)
Display top 5 and bottom 5 classes by sample count in a two-column layout.
Dataset information dictionary containing train_samples
from utils.dataset_viz import render_class_summary
render_class_summary(dataset_info)
Output Layout:
┌─────────────────────┬─────────────────────┐
│ Most Common │ Least Common │
├─────────────────────┼─────────────────────┤
│ class_a: 1,245 │ class_x: 45 │
│ class_b: 982 │ class_y: 67 │
│ class_c: 834 │ class_z: 89 │
│ class_d: 756 │ class_w: 102 │
│ class_e: 698 │ class_v: 134 │
└─────────────────────┴─────────────────────┘
File reference: app/utils/dataset_viz.py:55
Sample Grid
render_sample_grid(dataset_info, selected_class)
Display a grid of sample images with their dimensions.
Dataset information dictionary containing sample_paths
Class name to display samples from, or “All” for mixed samples
from utils.dataset_viz import render_sample_grid
import streamlit as st
selected_class = st.selectbox(
"Select Class",
options=["All"] + dataset_info["classes"]
)
render_sample_grid(dataset_info, selected_class)
Expected dataset_info Structure:
dataset_info = {
"sample_paths": {
"malware_family_1": [
Path("/path/to/sample1.png"),
Path("/path/to/sample2.png"),
# ... up to 10 samples per class
],
"malware_family_2": [...]
},
"classes": ["malware_family_1", "malware_family_2"]
}
Grid Features:
- 5-column layout
- Up to 10 samples displayed
- Image dimensions shown below each sample
- Random sampling when “All” is selected (2 per class, max 10 total)
- Error handling for missing/corrupt images
File reference: app/utils/dataset_viz.py:73
Example Output:
┌─────┬─────┬─────┬─────┬─────┐
│ img │ img │ img │ img │ img │
│224x │224x │224x │224x │224x │
│ 224 │ 224 │ 224 │ 224 │ 224 │
├─────┼─────┼─────┼─────┼─────┤
│ img │ img │ img │ img │ img │
│224x │224x │224x │224x │224x │
│ 224 │ 224 │ 224 │ 224 │ 224 │
└─────┴─────┴─────┴─────┴─────┘
Data Split Visualization
render_split_pie_chart(train_final, val_final, test_final)
Render a donut chart showing the distribution of data splits.
Number of training samples
Number of validation samples
from utils.dataset_viz import render_split_pie_chart
train_count = 7000
val_count = 1500
test_count = 1500
render_split_pie_chart(train_count, val_count, test_count)
Chart Features:
- Donut chart with 30% hole
- Custom colors:
- Training: #98c127 (green)
- Validation: #8fd7d7 (cyan)
- Test: #ffb255 (orange)
- Compact layout (300px height)
- Dark theme with transparent background
- Percentage labels automatically calculated
File reference: app/utils/dataset_viz.py:104
Preprocessing Preview
render_preprocessing_preview(sample_path, target_size, color_mode)
Show before/after comparison of image preprocessing.
Path to sample image file
Target size in format “WIDTHxHEIGHT” (e.g., “224x224”)
Color mode - “RGB” or “Grayscale”
from utils.dataset_viz import render_preprocessing_preview
from pathlib import Path
sample_path = Path("/path/to/sample.png")
render_preprocessing_preview(
sample_path=sample_path,
target_size="224x224",
color_mode="RGB"
)
Visual Layout:
┌─────────────────────┬─────────────────────┐
│ Original Image │ After Processing │
├─────────────────────┼─────────────────────┤
│ [Original Image] │ [Processed Image] │
│ Size: 512x512 │ Size: 224x224 │
│ │ Mode: RGB │
└─────────────────────┴─────────────────────┘
Preprocessing Operations:
- Resize to target dimensions using LANCZOS resampling
- Convert to grayscale if color_mode is “Grayscale”
- Display size and mode information
File reference: app/utils/dataset_viz.py:126
Example with Grayscale:
render_preprocessing_preview(
sample_path="dataset/class_a/sample_001.png",
target_size="128x128",
color_mode="Grayscale"
)
Plotly Configuration
All visualization functions use consistent Plotly theming:
# Common layout settings
layout = {
"paper_bgcolor": "rgba(0,0,0,0)", # Transparent background
"plot_bgcolor": "rgba(0,0,0,0)", # Transparent plot area
"font": {"color": "#fafafa"}, # Light text color
"height": 500, # Chart height
"xaxis": {"tickangle": -45} # Rotated x-axis labels
}
Color Palette:
- Training data:
#98c127 (Soft green)
- Validation data:
#8fd7d7 (Soft cyan)
- Test data:
#ffb255 (Soft orange)
- Accent colors:
#f45f74 (Soft pink), #bdd373 (Light green)
Image Processing
The visualization utilities use PIL (Pillow) for image processing:
from PIL import Image
# Resize with high-quality resampling
Image.Resampling.LANCZOS # Used for all resize operations
# Color mode conversion
image.convert("L") # Convert to grayscale
image.convert("RGB") # Convert to RGB
Error Handling
All visualization functions include error handling:
try:
img = Image.open(img_path)
st.image(img, width="stretch")
st.caption(f"{img.size[0]}x{img.size[1]}")
except Exception as exception:
st.error(f"Error: {img_path.name}. {exception}")
Best Practices
Performance Optimization:
# Cache dataset info to avoid repeated scans
from state.cache import get_dataset_info, set_dataset_info
if not get_dataset_info():
# Perform expensive scan
dataset_info = scan_dataset(path)
set_dataset_info(dataset_info) # Cache result
# Use cached data
dataset_info = get_dataset_info()
render_class_distribution_chart(dataset_info)
Responsive Layouts:
import streamlit as st
# Use columns for side-by-side visualizations
col1, col2 = st.columns(2)
with col1:
render_class_summary(dataset_info)
with col2:
render_split_pie_chart(train, val, test)
Conditional Rendering:
if dataset_info and dataset_info.get("train_samples"):
render_class_distribution_chart(dataset_info)
else:
st.warning("No dataset information available")
Integration Example
Complete example showing dataset visualization workflow:
import streamlit as st
from state.cache import get_dataset_info
from utils.dataset_viz import (
render_class_distribution_chart,
render_class_summary,
render_sample_grid,
render_split_pie_chart,
render_preprocessing_preview
)
st.header("Dataset Overview")
# Get cached dataset info
dataset_info = get_dataset_info()
if dataset_info:
# Distribution chart
st.subheader("Class Distribution")
render_class_distribution_chart(dataset_info)
# Summary statistics
col1, col2 = st.columns(2)
with col1:
render_class_summary(dataset_info)
with col2:
total_train = dataset_info["total_train"]
total_val = dataset_info["total_val"]
total_test = dataset_info.get("total_test", 0)
render_split_pie_chart(total_train, total_val, total_test)
# Sample grid
st.subheader("Sample Images")
selected_class = st.selectbox(
"View samples from:",
options=["All"] + dataset_info["classes"]
)
render_sample_grid(dataset_info, selected_class)
# Preprocessing preview
st.subheader("Preprocessing Preview")
sample_paths = dataset_info["sample_paths"]
if sample_paths:
first_class = list(sample_paths.keys())[0]
sample_path = sample_paths[first_class][0]
render_preprocessing_preview(
sample_path,
target_size="224x224",
color_mode="RGB"
)
else:
st.info("Configure your dataset to view visualizations")