Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/OverCV/UC-Intel-Final/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The UC Intel Final platform provides three families of neural network architectures, each designed for different use cases and computational constraints:

Custom CNN

Build convolutional neural networks from scratch with configurable layer stacks

Transfer Learning

Fine-tune pre-trained models (VGG, ResNet, EfficientNet) for faster convergence

Vision Transformer

State-of-the-art transformer architecture with self-attention mechanisms

Base Model Interface

All models inherit from the BaseModel abstract class, ensuring consistent interfaces: Location: app/models/base.py:11-71
from abc import ABC, abstractmethod
from typing import Any, Dict, Tuple
import torch.nn as nn

class BaseModel(ABC):
    """Abstract base class for model implementations"""
    
    def __init__(self, config: Dict[str, Any]):
        """
        Initialize model with configuration
        
        Args:
            config: Model configuration dictionary
        """
        self.config = config
        self.model = None
    
    @abstractmethod
    def build(self) -> nn.Module:
        """
        Build and return the model
        
        Returns:
            PyTorch model (nn.Module)
        """
        pass
    
    @abstractmethod
    def get_parameters_count(self) -> Tuple[int, int]:
        """
        Get total and trainable parameter counts
        
        Returns:
            Tuple of (total_params, trainable_params)
        """
        pass
    
    def get_model_summary(self) -> Dict[str, Any]:
        """Get model summary statistics"""
        if self.model is None:
            self.model = self.build()
        
        total_params, trainable_params = self.get_parameters_count()
        
        return {
            "total_parameters": total_params,
            "trainable_parameters": trainable_params,
            "model_type": self.config.get("model_type", "Unknown"),
            "architecture": self.config.get("architecture", "Unknown"),
            "num_classes": self.config.get("num_classes", 0)
        }
All models implement the same interface, making it easy to swap architectures during experimentation.

Custom CNN

Overview

The Custom CNN builder allows you to construct convolutional neural networks from a layer stack configuration. This provides maximum flexibility for architecture experimentation. Location: app/models/pytorch/cnn_builder.py

Architecture

Supported Layer Types

Conv2D - 2D Convolutional layerParameters:
  • filters (int): Number of output channels (default: 32)
  • kernel_size (int): Kernel size (default: 3)
  • activation (str): Activation function - “relu”, “leaky_relu”, “gelu”, “swish” (default: “relu”)
  • padding (str): “same” or “valid” (default: “same”)
Implementation (app/models/pytorch/cnn_builder.py:178-194):
def _build_conv2d(self, in_channels: int, params: dict) -> tuple:
    filters = params.get("filters", 32)
    kernel_size = params.get("kernel_size", 3)
    activation = params.get("activation", "relu")
    padding_mode = params.get("padding", "same")
    
    padding = kernel_size // 2 if padding_mode == "same" else 0
    
    layers = [
        nn.Conv2d(in_channels, filters, 
                 kernel_size=kernel_size, 
                 padding=padding),
        self._get_activation(activation)
    ]
    
    return nn.Sequential(*layers), filters
Output shape: (batch, filters, height, width)

Activation Functions

Location: app/models/pytorch/cnn_builder.py:75-81
ACTIVATION_MAP = {
    "relu": nn.ReLU(inplace=True),
    "leaky_relu": nn.LeakyReLU(0.1, inplace=True),
    "gelu": nn.GELU(),
    "swish": nn.SiLU(inplace=True),
    "none": nn.Identity(),
}

ReLU

Formula: f(x) = max(0, x)Pros:
  • Fast computation
  • Sparse activation
  • Widely used
Cons:
  • Dying ReLU problem

Leaky ReLU

Formula: f(x) = x if x > 0 else 0.1xPros:
  • Fixes dying ReLU
  • Allows negative gradients
Use when: Training deep networks

GELU

Formula: f(x) = x * Φ(x) (Gaussian Error Linear Unit)Pros:
  • Smooth activation
  • Better for transformers
  • State-of-the-art results
Use when: Using transformer-style architectures

Swish (SiLU)

Formula: f(x) = x * sigmoid(x)Pros:
  • Self-gated activation
  • Smooth and non-monotonic
  • Often outperforms ReLU
Use when: Need smooth gradients

Example Configuration

Simple CNN for MNIST-style data:
config = {
    "model_type": "Custom CNN",
    "num_classes": 9,
    "cnn_config": {
        "layers": [
            # Block 1
            {"type": "Conv2D", "params": {"filters": 32, "kernel_size": 3, "activation": "relu"}},
            {"type": "Conv2D", "params": {"filters": 32, "kernel_size": 3, "activation": "relu"}},
            {"type": "MaxPooling2D", "params": {"pool_size": 2}},
            {"type": "BatchNorm"},
            {"type": "Dropout", "params": {"rate": 0.25}},
            
            # Block 2
            {"type": "Conv2D", "params": {"filters": 64, "kernel_size": 3, "activation": "relu"}},
            {"type": "Conv2D", "params": {"filters": 64, "kernel_size": 3, "activation": "relu"}},
            {"type": "MaxPooling2D", "params": {"pool_size": 2}},
            {"type": "BatchNorm"},
            {"type": "Dropout", "params": {"rate": 0.25}},
            
            # Block 3
            {"type": "Conv2D", "params": {"filters": 128, "kernel_size": 3, "activation": "relu"}},
            {"type": "GlobalAvgPool"},
            
            # Classifier
            {"type": "Dense", "params": {"units": 256, "activation": "relu"}},
            {"type": "Dropout", "params": {"rate": 0.5}},
        ]
    }
}
Parameter Count: ~200K parameters

Forward Pass

Location: app/models/pytorch/cnn_builder.py:205-232
def forward(self, x: torch.Tensor) -> torch.Tensor:
    """
    Forward pass
    
    Args:
        x: Input tensor of shape (batch, channels, height, width)
    
    Returns:
        Output logits of shape (batch, num_classes)
    """
    # Apply feature extraction layers
    for layer in self.feature_layers:
        x = layer(x)
    
    # Apply transition (flatten or global pool)
    if self.use_global_pool:
        x = torch.mean(x, dim=[2, 3])
    else:
        x = torch.flatten(x, 1)
    
    # Apply classifier layers
    for layer in self.classifier_layers:
        x = layer(x)
    
    # Output layer
    x = self.output_layer(x)
    
    return x

Transfer Learning

Overview

Transfer learning leverages pre-trained models trained on ImageNet (1.2M images, 1000 classes) to accelerate training and improve performance on smaller datasets. Location: app/models/pytorch/transfer.py

Supported Base Models

VGG16 / VGG19Architecture: Deep CNNs with small 3x3 filtersCharacteristics:
  • 16 or 19 layers
  • Simple, uniform architecture
  • Large number of parameters (~138M for VGG16)
Input size: 224x224Feature dimensions: 512 (after global pooling)Use when: Need simple, well-understood architectureImplementation (app/models/pytorch/transfer.py:152-154):
"VGG16": lambda: models.vgg16(pretrained=use_pretrained),
"VGG19": lambda: models.vgg19(pretrained=use_pretrained),

Fine-Tuning Strategies

Location: app/models/pytorch/transfer.py:194-217
Strategy: Freeze all base model layers, train only classifierImplementation:
# Freeze all base model parameters
for param in self.base_model.parameters():
    param.requires_grad = False
Trainable parameters: ~10K (classifier only)Use when:
  • Small dataset (<1000 images/class)
  • Limited compute resources
  • Domain similar to ImageNet
Training time: Fastest (1-2 hours)Expected performance: Good baseline

Custom Classifier Head

Location: app/models/pytorch/transfer.py:125-146
# Build custom classifier head
classifier_layers = []

if global_pooling:
    self.global_pool = nn.AdaptiveAvgPool2d((1, 1))
else:
    self.global_pool = None

if add_dense:
    # Two-layer classifier
    classifier_layers.extend([
        nn.Linear(in_features, dense_units),
        nn.ReLU(inplace=True),
        nn.Dropout(dropout),
        nn.Linear(dense_units, num_classes)
    ])
else:
    # Single-layer classifier
    classifier_layers.extend([
        nn.Dropout(dropout),
        nn.Linear(in_features, num_classes)
    ])

self.classifier = nn.Sequential(*classifier_layers)
Options:
  • Global Pooling: Reduces spatial dimensions to 1x1
  • Extra Dense Layer: Adds capacity (useful for complex domains)
  • Dropout: Regularization (default: 0.5)

Forward Pass

Location: app/models/pytorch/transfer.py:219-243
def forward(self, x: torch.Tensor) -> torch.Tensor:
    # Extract features with frozen/unfrozen base model
    features = self.base_model(x)
    
    # Apply global pooling if needed
    if self.global_pool is not None and len(features.shape) == 4:
        features = self.global_pool(features)
        features = torch.flatten(features, 1)
    elif len(features.shape) == 4:
        features = torch.flatten(features, 1)
    
    # Apply custom classifier
    output = self.classifier(features)
    
    return output

Vision Transformer

Overview

Vision Transformer (ViT) applies the transformer architecture (originally designed for NLP) to image classification by treating images as sequences of patches. Location: app/models/pytorch/transformer.py Paper: “An Image is Worth 16x16 Words” (Dosovitskiy et al., 2020)

Architecture

Patch Embedding

Location: app/models/pytorch/transformer.py:72-114 Converts 2D image into sequence of patch embeddings:
class PatchEmbedding(nn.Module):
    def __init__(
        self,
        image_size: int = 224,
        patch_size: int = 16,      # 16x16 patches
        in_channels: int = 3,
        embed_dim: int = 768,
    ):
        super().__init__()
        self.image_size = image_size
        self.patch_size = patch_size
        self.num_patches = (image_size // patch_size) ** 2  # 196 for 224x224
        
        # Use convolution to extract and embed patches
        self.proj = nn.Conv2d(
            in_channels, 
            embed_dim, 
            kernel_size=patch_size, 
            stride=patch_size  # Non-overlapping patches
        )
    
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        # (B, C, H, W) -> (B, embed_dim, H/P, W/P)
        x = self.proj(x)
        
        # (B, embed_dim, H/P, W/P) -> (B, num_patches, embed_dim)
        x = x.flatten(2).transpose(1, 2)
        
        return x
Example:
  • Input: (1, 3, 224, 224)
  • After projection: (1, 768, 14, 14)
  • After flatten: (1, 196, 768)

Multi-Head Self-Attention

Location: app/models/pytorch/transformer.py:117-164
class MultiHeadAttention(nn.Module):
    def __init__(self, embed_dim: int, num_heads: int, dropout: float = 0.0):
        super().__init__()
        assert embed_dim % num_heads == 0
        
        self.embed_dim = embed_dim
        self.num_heads = num_heads
        self.head_dim = embed_dim // num_heads
        self.scale = self.head_dim ** -0.5
        
        # Single linear layer to compute Q, K, V
        self.qkv = nn.Linear(embed_dim, embed_dim * 3)
        self.attn_drop = nn.Dropout(dropout)
        self.proj = nn.Linear(embed_dim, embed_dim)
        self.proj_drop = nn.Dropout(dropout)
    
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        B, N, C = x.shape
        
        # Generate Q, K, V
        qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, self.head_dim)
        qkv = qkv.permute(2, 0, 3, 1, 4)
        q, k, v = qkv[0], qkv[1], qkv[2]  # Each: (B, num_heads, N, head_dim)
        
        # Scaled dot-product attention
        attn = (q @ k.transpose(-2, -1)) * self.scale
        attn = attn.softmax(dim=-1)
        attn = self.attn_drop(attn)
        
        # Apply attention to values
        x = (attn @ v).transpose(1, 2).reshape(B, N, C)
        
        # Output projection
        x = self.proj(x)
        x = self.proj_drop(x)
        
        return x
Attention Mechanism:
  1. Linear projection to Q, K, V
  2. Split into multiple heads
  3. Compute attention scores: Attention(Q, K, V) = softmax(QK^T / √d_k)V
  4. Concatenate heads
  5. Output projection

Transformer Block

Location: app/models/pytorch/transformer.py:195-220
class TransformerBlock(nn.Module):
    def __init__(
        self,
        embed_dim: int,
        num_heads: int,
        mlp_ratio: float = 4.0,
        dropout: float = 0.0,
    ):
        super().__init__()
        self.norm1 = nn.LayerNorm(embed_dim)
        self.attn = MultiHeadAttention(embed_dim, num_heads, dropout)
        self.norm2 = nn.LayerNorm(embed_dim)
        self.mlp = MLP(
            in_features=embed_dim,
            hidden_features=int(embed_dim * mlp_ratio),  # 3072 for 768-dim
            dropout=dropout
        )
    
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        # Attention with residual (pre-norm)
        x = x + self.attn(self.norm1(x))
        
        # MLP with residual (pre-norm)
        x = x + self.mlp(self.norm2(x))
        
        return x
Structure: LayerNorm → Attention → Residual → LayerNorm → MLP → Residual

Configuration Options

ViT-Base

Configuration:
  • Patch size: 16
  • Embed dim: 768
  • Depth: 12 blocks
  • Heads: 12
  • MLP ratio: 4.0
Parameters: ~86MUse when: Standard accuracy/speed tradeoff

ViT-Large

Configuration:
  • Patch size: 16
  • Embed dim: 1024
  • Depth: 24 blocks
  • Heads: 16
  • MLP ratio: 4.0
Parameters: ~307MUse when: Maximum accuracy, large dataset

ViT-Small

Configuration:
  • Patch size: 16
  • Embed dim: 384
  • Depth: 12 blocks
  • Heads: 6
  • MLP ratio: 4.0
Parameters: ~22MUse when: Limited compute, faster inference

Custom

Configurable parameters:
  • Patch size (8, 16, 32)
  • Embed dimension
  • Number of blocks
  • Number of heads
  • MLP ratio
  • Dropout rate
Use when: Specific requirements

Forward Pass

Location: app/models/pytorch/transformer.py:304-338
def forward(self, x: torch.Tensor) -> torch.Tensor:
    B = x.shape[0]
    
    # 1. Patch embedding
    x = self.patch_embed(x)  # (B, num_patches, embed_dim)
    
    # 2. Add CLS token
    cls_tokens = self.cls_token.expand(B, -1, -1)  # (B, 1, embed_dim)
    x = torch.cat((cls_tokens, x), dim=1)  # (B, num_patches + 1, embed_dim)
    
    # 3. Add position embeddings
    x = x + self.pos_embed
    x = self.pos_drop(x)
    
    # 4. Apply transformer blocks
    for block in self.blocks:
        x = block(x)
    
    # 5. Normalize
    x = self.norm(x)
    
    # 6. Extract CLS token and classify
    cls_output = x[:, 0]  # (B, embed_dim)
    x = self.head(cls_output)  # (B, num_classes)
    
    return x

Model Selection Guide

Small (<1000 images/class):
  • ✅ Transfer Learning (Feature Extraction)
  • ✅ Transfer Learning (Partial Fine-tuning)
  • ⚠️ Custom CNN (risk of overfitting)
  • ❌ Vision Transformer (requires large dataset)
Medium (1000-5000 images/class):
  • ✅ Transfer Learning (Partial/Full Fine-tuning)
  • ✅ Custom CNN (with regularization)
  • ⚠️ Vision Transformer (may underperform)
Large (>5000 images/class):
  • ✅ All architectures
  • ✅ Vision Transformer (best performance)
  • ✅ Transfer Learning (Full Fine-tuning)
  • ✅ Custom CNN (deep architectures)

Performance Comparison

Typical Results on Malware Dataset

ArchitectureParametersTraining TimeAccuracyGPU Memory
Custom CNN (Small)~200K1-2 hours85-88%2 GB
Custom CNN (Deep)~2M3-4 hours88-91%4 GB
ResNet50 (Feature Ext.)~25M1-2 hours90-93%4 GB
ResNet50 (Partial FT)~25M3-5 hours92-95%6 GB
ResNet50 (Full FT)~25M6-10 hours93-96%8 GB
EfficientNetB0~5M2-4 hours91-94%3 GB
ViT-Small~22M8-12 hours90-93%8 GB
ViT-Base~86M12-24 hours94-97%16 GB
Results vary based on dataset size, quality, and training configuration. These are representative ranges.

References

  • Custom CNN implementation: app/models/pytorch/cnn_builder.py
  • Transfer learning implementation: app/models/pytorch/transfer.py
  • Vision Transformer implementation: app/models/pytorch/transformer.py
  • Base model interface: app/models/base.py
  • Model building in training worker: app/training/worker.py:29-42