Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/OverCV/UC-Intel-Final/llms.txt

Use this file to discover all available pages before exploring further.

UC Intel Final - Malware Classification Platform

An advanced ensemble machine learning platform for classifying malware using deep learning techniques with PyTorch. This project provides a professional, multi-page Streamlit dashboard for building, training, and evaluating malware classification models.

What is Malware Image Classification?

Malware binaries can be visualized as grayscale images, where each byte is mapped to a pixel intensity value. This visual representation allows deep learning models to detect patterns and classify malware families based on their structural characteristics.

Platform Overview

The UC Intel Final platform provides a complete end-to-end solution for:

Dataset Management

Configure train/validation/test splits, apply preprocessing, and set up data augmentation pipelines

Model Builder

Design custom CNNs, use transfer learning with pre-trained models, or build transformer architectures

Training Pipeline

Train with customizable hyperparameters, live monitoring, and automatic checkpointing

Model Interpretability

Visualize model decisions with Grad-CAM, analyze misclassifications, and explore embeddings

Key Features

Professional Streamlit Dashboard

  • Multi-page architecture with self-contained modules
  • Theme customization with color presets and CSS injection
  • Session management for saving and resuming work
  • Real-time training monitoring with live metrics updates

Flexible Model Architectures

# Build custom architectures layer by layer
from models.pytorch.cnn_builder import CustomCNNBuilder

config = {
    "cnn_config": {
        "layers": [
            {"type": "Conv2D", "filters": 64, "kernel_size": 3, "activation": "relu"},
            {"type": "MaxPool", "pool_size": 2},
            {"type": "Conv2D", "filters": 128, "kernel_size": 3, "activation": "relu"},
            {"type": "Flatten"},
            {"type": "Dense", "units": 256, "activation": "relu"},
        ]
    },
    "num_classes": 9
}

builder = CustomCNNBuilder(config)
model = builder.build()

Comprehensive Training Engine

The training pipeline includes:
  • Multiple optimizers: Adam, AdamW, SGD with Momentum, RMSprop
  • Learning rate schedulers: ReduceLROnPlateau, Cosine Annealing, Step Decay, Exponential
  • Class imbalance handling: Auto class weights, Focal Loss
  • Early stopping with configurable patience
  • Automatic checkpointing for best models
  • Real-time metrics: Loss, accuracy, precision, recall, F1-score

Advanced Data Augmentation

The platform provides three built-in augmentation presets:Light Augmentation
  • Rotation: ±10°
  • Horizontal flip: 50%
  • Brightness: ±10%
Moderate Augmentation
  • Rotation: ±20°
  • Horizontal flip: 50%
  • Vertical flip: 30%
  • Brightness: ±20%
  • Contrast: ±20%
Heavy Augmentation
  • Rotation: ±30°
  • Horizontal & vertical flip: 50%
  • Brightness: ±30%
  • Contrast: ±30%
  • Gaussian noise: 5%

Who is This For?

Researchers & Students

Ideal for academic projects and experiments in:
  • Deep learning for cybersecurity
  • Malware analysis and classification
  • Computer vision applications
  • Model interpretability research

ML Engineers

Provides a production-ready framework for:
  • Rapid prototyping of CNN architectures
  • Transfer learning experimentation
  • Hyperparameter tuning and optimization
  • Model performance benchmarking

Security Analysts

Enables security teams to:
  • Build custom malware classifiers
  • Analyze model predictions with Grad-CAM
  • Identify misclassification patterns
  • Evaluate model robustness

Architecture Principles

1

Self-Contained Pages

Each page in the content/ directory is fully self-contained with its own folder structure
2

State Management

All session state access goes through abstraction layers in state/ module (no direct st.session_state access)
3

Tab-Based Organization

Complex pages split content into multiple tab files for better code organization
4

Flat Components

Shared UI components stay in a flat components/ directory, not nested

Project Structure

app/
├── main.py                      # Entry point + navigation
├── content/                     # Self-contained page modules
│   ├── home/                   # Home & session setup
│   ├── dataset/                # Dataset configuration (4 tabs)
│   ├── model/                  # Model architecture builder
│   ├── training/               # Training configuration
│   ├── monitor/                # Live training monitor
│   ├── results/                # Results & evaluation
│   └── interpret/              # Model interpretability
├── components/                  # Shared UI components
│   ├── header.py               # App header with session info
│   ├── sidebar.py              # Configuration status
│   ├── theme.py                # Theme customization
│   └── utils.py                # GPU detection, system info
├── state/                       # Session state management
│   ├── workflow.py             # ML workflow state
│   ├── ui.py                   # UI preferences
│   └── cache.py                # Cached data
├── models/                      # Model builders
│   └── pytorch/
│       ├── cnn_builder.py      # Custom CNN builder
│       ├── transfer.py         # Transfer learning
│       └── transformer.py      # Transformer models
├── training/                    # Training infrastructure
│   ├── engine.py               # Core training loop
│   ├── dataset.py              # PyTorch datasets
│   ├── transforms.py           # Data augmentation
│   └── optimizers.py           # Optimizer configuration
└── utils/                       # Utility functions
    ├── dataset_utils.py        # Dataset scanning
    └── dataset_viz.py          # Visualization helpers

Technology Stack

PyTorch

Deep learning framework for building and training neural networks

Streamlit

Interactive dashboard for the complete ML workflow

torchvision

Pre-trained models and image transformations

scikit-learn

Metrics calculation and evaluation tools

Plotly

Interactive visualizations and charts

UMAP

Dimensionality reduction for embedding visualization

Next Steps

Quick Start

Get up and running in 5 minutes

Installation

Detailed installation instructions
This platform was developed as part of the Sistemas Inteligentes II course at Universidad de Caldas, taught by Professor Jorge Alberto Jaramillo Garzón.