Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/OverCV/UC-Intel-Final/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The Interpretability page (/interpretability) provides tools to understand how and why your trained model makes predictions. Use Grad-CAM for visual attention, t-SNE for embedding visualization, and detailed misclassification analysis.
Interpretability tools require a completed experiment with a trained model. Complete training first before accessing this page.

Page Structure

Experiment Selection:
  • Dropdown at top to select completed experiment
  • Shows experiment name and model used
Five Tabs:
  1. Architecture - Model structure review
  2. Misclassifications - Analyze prediction errors
  3. Embeddings - t-SNE visualization of learned features
  4. Grad-CAM - Visual attention heatmaps
  5. Advanced - Additional interpretability techniques

Tab 1: Architecture Review

Review the model architecture used in the selected experiment.

Model Summary

Displays:
  • Model Name: From model library
  • Model Type: Custom CNN, Transformer, or Transfer Learning
  • Total Parameters: Trainable + non-trainable
  • Trainable Parameters: Updated during training
  • Non-Trainable Parameters: Frozen weights (transfer learning)

Layer-by-Layer Breakdown

Convolutional Blocks:
  • Layer name (e.g., “Conv2D_1”)
  • Filters, kernel size, activation
  • Batch normalization status
  • Pooling type and size
  • Dropout rate
Dense Layers:
  • Units and activation
  • Dropout rate
Output Layer:
  • Number of classes
  • Softmax activation
Use this tab to verify the exact architecture that was trained, especially when comparing multiple experiments.

Tab 2: Misclassifications

Analyze prediction errors to understand model weaknesses.

Error Type Filter

Select which misclassifications to view:
  • All Errors: Every incorrect prediction
  • By True Class: Filter errors for specific malware family
  • By Predicted Class: Filter by what the model predicted
  • Confidence Threshold: Show only high-confidence errors (model was confident but wrong)
Displays misclassified images in a grid: Each image shows:
  • Original Image: Misclassified sample
  • True Label: Actual malware family (green)
  • Predicted Label: Model’s prediction (red)
  • Confidence: Softmax probability for predicted class
  • Top-3 Predictions: Model’s top 3 choices with probabilities
Example:
Image: malware_sample_1234.png
True: Ramnit
Predicted: Lollipop (85% confidence)
Top-3:
  1. Lollipop: 85%
  2. Ramnit: 12%
  3. Kelihos: 3%
High-confidence errors (>80%) are most concerning. They indicate systematic confusion that the model is confident about, suggesting visual similarity or data issues.

Error Analysis Summary

Metrics displayed:
  • Total Misclassifications: Count of errors
  • Error Rate: Percentage of test set misclassified
  • Most Confused Pair: Which two classes are most often swapped
  • Worst Performing Class: Class with lowest recall
Use misclassification analysis to guide data collection. If specific pairs are confused, collect more distinguishing samples or increase augmentation for those classes.

Tab 3: Embeddings

Visualize learned feature representations using dimensionality reduction.

t-SNE Visualization

What is t-SNE?
  • t-Distributed Stochastic Neighbor Embedding
  • Projects high-dimensional features to 2D for visualization
  • Preserves local structure (similar samples cluster together)
Chart Display:
  • Each point: One test sample
  • Color: True class label
  • Position: 2D projection of learned features
  • Clusters: Samples from same class should cluster
Embeddings are extracted from the second-to-last layer (before softmax), representing the model’s learned feature space.

Interpreting t-SNE Plots

Good sign:
  • Each class forms distinct cluster
  • Minimal overlap between classes
  • Clear boundaries
Indicates: Model learned discriminative featuresExample: Ramnit cluster far from Lollipop cluster
Hover over points to see sample details: filename, true label, predicted label, confidence.

Configuration Options

t-SNE Parameters:
  • Perplexity: 5-50 (default: 30)
    • Higher = considers more neighbors
    • Lower = focuses on local structure
  • Learning Rate: 10-1000 (default: 200)
  • Iterations: 250-5000 (default: 1000)
t-SNE is non-deterministic. Running multiple times produces different layouts, but cluster structure should remain consistent.

Tab 4: Grad-CAM

Gradient-weighted Class Activation Mapping - visualize where the model “looks” when making predictions.

What is Grad-CAM?

Grad-CAM uses gradients flowing into the last convolutional layer to produce a heatmap showing which regions of the image contributed most to the prediction.
  • Red areas: High importance (model focuses here)
  • Blue areas: Low importance (model ignores)
  • Overlay on image: Shows attention directly on input

Interface

1

Select Sample

  • True Class Dropdown: Filter by actual malware family
  • Sample Selector: Choose specific image from class
  • Prediction/Correct Filter: Show only correct or incorrect predictions
2

View Visualization

Three-panel display:
  1. Original Image: Input image
  2. Grad-CAM Heatmap: Attention heatmap (red = important)
  3. Overlay: Heatmap superimposed on image
3

Analyze Attention

  • Prediction: Model’s predicted class and confidence
  • True Label: Actual class
  • Top-3 Predictions: Alternative predictions with confidences

Interpreting Grad-CAM Heatmaps

Good attention:
  • Heatmap highlights relevant image regions
  • Consistent patterns across samples from same class
  • Focuses on discriminative features
Example: For Ramnit malware, model focuses on characteristic header structure
If Grad-CAM shows consistent attention on non-malware features (e.g., image borders, watermarks), your model may have learned dataset biases instead of malware characteristics.

Grad-CAM Options

Layer Selection:
  • Last Conv Layer (default): Broadest semantic understanding
  • Earlier Layers: More localized, fine-grained attention
Colormap:
  • Jet: Red (important) to blue (unimportant)
  • Viridis: Purple to yellow
  • Hot: Black to red to white
Compare Grad-CAM across correctly and incorrectly classified samples to identify what the model attends to in each case.

Tab 5: Advanced

Additional interpretability techniques.

Feature Importance

For Custom CNNs:
  • Filter Visualizations: What patterns each convolutional filter detects
  • Activation Maximization: Synthetic images that maximally activate specific neurons

Attention Rollout (Transformers Only)

For Vision Transformers:
  • Attention Weights: Which patches the model attends to
  • Rollout Visualization: Aggregated attention across all layers
  • Per-Head Analysis: Different attention heads focus on different features

Saliency Maps

Gradient-based saliency:
  • Compute gradient of output w.r.t. input pixels
  • Shows which pixels, if changed, would most affect prediction
  • Finer-grained than Grad-CAM

Integrated Gradients

Path-based attribution:
  • Computes gradient along path from baseline to input
  • More accurate attribution than simple gradients
  • Shows pixel-level importance
Advanced techniques provide deeper insights but require more computation. Start with Grad-CAM and t-SNE for initial analysis.

Use Cases

Debugging Poor Performance

1

Check Misclassifications

Identify which classes are confused
2

View Embeddings

Confirm if confused classes overlap in feature space
3

Analyze Grad-CAM

Verify model focuses on relevant features, not artifacts
4

Review Architecture

Ensure model has sufficient capacity for task

Validating Model Behavior

1

Grad-CAM on Correct Predictions

Verify model attends to malware-specific regions
2

Embeddings Clustering

Confirm classes are well-separated
3

Error Analysis

Check that errors make sense (confused classes are actually similar)

Identifying Data Issues

1

Grad-CAM Artifact Detection

Look for consistent attention on watermarks or borders
2

Outlier Detection in t-SNE

Find scattered points far from class cluster (potential mislabels)
3

High-Confidence Errors

Investigate samples the model is confident about but wrong (data quality?)

Tips & Best Practices

Start with t-SNE: Quick overview of whether model learned separable features.
Use Grad-CAM for Debugging: If performance is poor, check if model focuses on relevant features.
Analyze Errors First: Understanding misclassifications is more valuable than confirming correct predictions.
Grad-CAM highlights correlations, not causation. High attention doesn’t mean that region caused the prediction, only that it correlates.
Cross-Reference Tools: Use multiple techniques together (e.g., t-SNE shows overlap → Grad-CAM shows why → Misclassifications show which samples).

Limitations

Grad-CAM

  • Only works with CNNs (requires convolutional layers)
  • Coarse spatial resolution
  • May miss fine-grained details
  • Alternative: Integrated Gradients for finer detail

t-SNE

  • Non-deterministic (different runs produce different layouts)
  • Computationally expensive for large datasets
  • Hyperparameter-sensitive (perplexity affects structure)
  • Alternative: UMAP for faster, deterministic results

Attention for Transformers

  • Attention weights ≠ importance (attention is not explanation)
  • Multiple heads may attend to different features
  • Requires specialized visualization tools

Summary

The Interpretability page provides essential tools for understanding your trained model:

Architecture Review

Verify exact model structure and parameters

Misclassifications

Identify and analyze prediction errors

Embeddings (t-SNE)

Visualize learned feature space clustering

Grad-CAM

See where the model focuses attention
Use these tools to:
  • Validate model behavior
  • Debug poor performance
  • Identify data quality issues
  • Understand prediction reasoning
Interpretability is crucial for deploying ML models in production. Always validate that your model makes predictions for the right reasons, not spurious correlations.