> ## Documentation Index
> Fetch the complete documentation index at: https://mintlify.com/OverCV/UC-Intel-Final/llms.txt
> Use this file to discover all available pages before exploring further.

# System Architecture

> Comprehensive overview of the UC Intel Final malware classification platform architecture

## Overview

The UC Intel Final platform is a comprehensive malware classification system built with a modern, modular architecture that separates concerns between the web interface (Streamlit), machine learning models (PyTorch), and training pipeline.

<CardGroup cols={2}>
  <Card title="Streamlit UI" icon="display">
    Interactive multi-page dashboard for experiment configuration and monitoring
  </Card>

  <Card title="PyTorch Models" icon="brain">
    Custom CNN, Transfer Learning, and Vision Transformer architectures
  </Card>

  <Card title="Training Pipeline" icon="gears">
    Background training engine with real-time monitoring and checkpointing
  </Card>

  <Card title="State Management" icon="database">
    File-based persistence with session state abstraction
  </Card>
</CardGroup>

## High-Level Architecture

```mermaid theme={null}
graph TB
    User[User] --> UI[Streamlit UI Layer]
    UI --> State[State Management]
    UI --> Components[Shared Components]
    
    State --> Workflow[Workflow State]
    State --> UIState[UI Preferences]
    State --> Cache[Cached Data]
    State --> Persist[File Persistence]
    
    UI --> Content[Page Modules]
    Content --> Dataset[Dataset Config]
    Content --> Model[Model Builder]
    Content --> Training[Training Config]
    Content --> Monitor[Training Monitor]
    Content --> Results[Results & Evaluation]
    Content --> Interpret[Interpretability]
    
    Model --> Models[PyTorch Models]
    Models --> CustomCNN[Custom CNN Builder]
    Models --> Transfer[Transfer Learning]
    Models --> Transformer[Vision Transformer]
    
    Training --> Engine[Training Engine]
    Engine --> Worker[Background Worker]
    Worker --> Checkpoint[Checkpoint Manager]
    Worker --> Metrics[Metric Tracking]
    
    Dataset --> DataLoader[PyTorch DataLoader]
    DataLoader --> Transforms[Data Transforms]
    DataLoader --> Augmentation[Augmentation Pipeline]
    
    Persist -.-> DiskStorage[(Session Files)]
    Cache -.-> Memory[(Cached Data)]
```

## Application Architecture

### Directory Structure

The platform follows a **self-contained architecture** where each module is isolated and communicates through well-defined interfaces:

```
app/
├── main.py                      # Entry point + navigation setup
│
├── content/                     # Self-contained page modules
│   ├── home/                   # Session management
│   ├── dataset/                # Dataset configuration with tabs
│   ├── model/                  # Model architecture builder
│   ├── training/               # Training configuration
│   ├── monitor/                # Live training monitoring
│   ├── results/                # Results & evaluation
│   └── interpret/              # Model interpretability
│
├── components/                  # Shared UI components (flat structure)
│   ├── header.py               # App header with session info
│   ├── sidebar.py              # Configuration status sidebar
│   ├── theme.py                # Theme customization
│   ├── styling.py              # CSS injection
│   └── utils.py                # GPU detection, utilities
│
├── state/                       # Session state management
│   ├── workflow.py             # ML workflow state
│   ├── ui.py                   # UI preferences
│   ├── cache.py                # Cached operations
│   ├── session_state.py        # Session utilities
│   └── persistence.py          # File-based persistence
│
├── models/                      # PyTorch model architectures
│   ├── base.py                 # Abstract base class
│   ├── pytorch/
│   │   ├── cnn_builder.py     # Custom CNN builder
│   │   ├── transfer.py        # Transfer learning models
│   │   └── transformer.py     # Vision Transformer
│   └── manual/                 # Manual implementations
│
├── training/                    # Training infrastructure
│   ├── engine.py               # Core training loop
│   ├── worker.py               # Background training
│   ├── dataset.py              # PyTorch Dataset & DataLoader
│   ├── transforms.py           # Image transformations
│   ├── optimizers.py           # Optimizer & scheduler factory
│   └── evaluator.py            # Model evaluation
│
└── utils/                       # Utility functions
    ├── dataset_utils.py        # Dataset scanning
    ├── dataset_viz.py          # Visualizations
    └── checkpoint_manager.py   # Model checkpointing
```

<Note>
  The architecture uses **NO `__init__.py` files** - all imports use absolute paths from the project root (e.g., `from models.pytorch.cnn_builder import CustomCNNBuilder`).
</Note>

## Architecture Principles

### 1. Self-Contained Page Modules

Each page in `content/` is fully self-contained with:

* `page.py` - Entry point that renders header/sidebar and calls view
* `view.py` - Main view logic and coordinator
* `tabs/` - Optional subfolder for complex multi-tab pages

**Example: Dataset Page Structure**

```python theme={null}
# content/dataset/page.py
from components.header import render_header
from components.sidebar import render_sidebar
from content.dataset import view

render_header()
render_sidebar()
view.render()
```

### 2. State Management Abstraction

<Info>
  **Critical Design Pattern**: All session state access goes through `state/` module functions. **NEVER** access `st.session_state` directly from page code.
</Info>

State is divided into three domains:

* **`workflow.py`** - ML workflow state (configs, training status, results)
* **`ui.py`** - UI preferences (theme, past sessions)
* **`cache.py`** - Cached operations (dataset scans, expensive computations)

**Example: Proper State Access**

```python theme={null}
# ✅ CORRECT
from state.workflow import get_dataset_config, save_dataset_config
config = get_dataset_config()
save_dataset_config(new_config)

# ❌ WRONG
config = st.session_state.dataset_config
st.session_state.dataset_config = new_config
```

See `app/state/workflow.py:57-365` for complete implementation.

### 3. Background Training with Thread Safety

Training runs in a **background thread** to avoid blocking the UI. The worker uses **file-based I/O** instead of `st.session_state` since session state is thread-local:

```python theme={null}
def _run_training(session_id: str, experiment_id: str):
    # Get configs from files (thread-safe)
    experiment = get_experiment_from_file(session_id, experiment_id)
    model_config = get_model_from_file(session_id, experiment['model_id'])
    
    # Build model and train
    model = build_model(model_config)
    engine = TrainingEngine(...)
    results = engine.fit(epochs=epochs)
    
    # Write results to file (thread-safe)
    write_experiment_update(session_id, experiment_id, results)
```

See `app/training/worker.py:45-257` for complete implementation.

### 4. Model Architecture Pattern

All models inherit from `BaseModel` abstract class and implement:

* `build()` - Constructs and returns the PyTorch `nn.Module`
* `get_parameters_count()` - Returns total and trainable parameter counts
* `validate_config()` - Validates model configuration

```python theme={null}
class BaseModel(ABC):
    def __init__(self, config: Dict[str, Any]):
        self.config = config
        self.model = None
    
    @abstractmethod
    def build(self) -> nn.Module:
        pass
    
    @abstractmethod
    def get_parameters_count(self) -> Tuple[int, int]:
        pass
```

See `app/models/base.py:11-71` for complete implementation.

## Data Flow

### Configuration Flow

```mermaid theme={null}
sequenceDiagram
    participant User
    participant UI as Streamlit Page
    participant State as State Module
    participant Persist as File Persistence
    
    User->>UI: Configure dataset/model/training
    UI->>State: save_*_config(config)
    State->>st.session_state: Update session state
    State->>Persist: save_session(session_id)
    Persist->>Disk: Write JSON file
    UI->>User: Show confirmation
```

### Training Flow

```mermaid theme={null}
sequenceDiagram
    participant User
    participant UI as Streamlit UI (Main Thread)
    participant Worker as Training Worker (Background Thread)
    participant Engine as Training Engine
    participant Disk as File Storage
    
    User->>UI: Start training
    UI->>Worker: start_training(experiment_id)
    Note over Worker: Thread starts
    Worker->>Disk: Read configs from files
    Worker->>Engine: Initialize & fit()
    
    loop Every batch
        Engine->>Worker: batch_callback(metrics)
        Worker->>Disk: Write progress update
        UI->>Disk: Poll for updates (fragment)
        Disk->>UI: Latest metrics
        UI->>User: Display live metrics
    end
    
    Engine->>Worker: Training complete
    Worker->>Disk: Write final results
    UI->>User: Show completion
```

### Real-Time Monitoring

The monitoring page uses Streamlit's `@st.fragment(run_every="1s")` to create auto-refreshing components:

```python theme={null}
@st.fragment(run_every="1s")
def live_training_monitor():
    if not is_training_active():
        return
    
    # Get latest metrics from file
    results = get_results()
    
    # Display live metrics
    col1, col2, col3 = st.columns(3)
    col1.metric("Epoch", results.get("epoch", 0))
    col2.metric("Loss", f"{results.get('loss', 0):.4f}")
    col3.metric("Accuracy", f"{results.get('accuracy', 0):.2%}")
```

See `.md/arch.md:432-464` for complete pattern.

## Component Architecture

### Shared Components

Components are **flat, reusable modules** that can be imported by any page:

<Tabs>
  <Tab title="Header">
    **Location**: `app/components/header.py`

    Renders the application header with:

    * Application title and logo
    * Current session ID
    * Session info button
    * Navigation breadcrumbs

    ```python theme={null}
    from components.header import render_header
    render_header()
    ```
  </Tab>

  <Tab title="Sidebar">
    **Location**: `app/components/sidebar.py`

    Shows configuration status in sidebar:

    * ✅ Dataset configured
    * ✅ Model configured
    * ✅ Training configured
    * Quick navigation links
    * GPU/CPU status

    ```python theme={null}
    from components.sidebar import render_sidebar
    render_sidebar()
    ```
  </Tab>

  <Tab title="Theme">
    **Location**: `app/components/theme.py`

    Theme customization with:

    * Color pickers (primary, secondary, background)
    * 4 preset themes (Green, Blue, Pink, Orange)
    * Dynamic CSS injection
    * Persistent across sessions

    ```python theme={null}
    from components.theme import render_theme_customization
    render_theme_customization()
    ```
  </Tab>

  <Tab title="Utils">
    **Location**: `app/components/utils.py`

    Utility functions:

    * `detect_gpu()` - Detect CUDA/MPS availability
    * `get_gpu_memory()` - Get GPU memory usage
    * `generate_session_id()` - Generate unique session IDs
    * `format_bytes()` - Format byte sizes
  </Tab>
</Tabs>

## Integration Points

### PyTorch Integration

The platform integrates tightly with PyTorch for all ML operations:

**Model Building** → `app/models/pytorch/`

* Custom CNN from layer configuration
* Transfer learning with pre-trained models (VGG, ResNet, EfficientNet)
* Vision Transformers with patch embeddings

**Data Loading** → `app/training/dataset.py:129-249`

* `MalwareDataset` - Custom PyTorch Dataset
* Automatic train/val/test splitting
* Weighted sampling for imbalanced classes
* Data augmentation pipeline

**Training** → `app/training/engine.py:13-306`

* `TrainingEngine` - Core training loop
* Callbacks for checkpointing and metrics
* Early stopping support
* Learning rate scheduling

### File System Integration

**Session Persistence** → `app/state/persistence.py`

```
.streamlit_sessions/
└── {session_id}/
    ├── session.json       # Session metadata
    ├── dataset.json       # Dataset configuration
    ├── models.json        # Model configurations
    ├── training.json      # Training configurations
    └── experiments.json   # Experiment results
```

**Model Checkpoints** → `app/utils/checkpoint_manager.py`

```
checkpoints/
└── {experiment_id}/
    ├── checkpoint_epoch_10.pth
    ├── checkpoint_epoch_20.pth
    └── best_model.pth     # Best model by validation loss
```

## Performance Considerations

### Caching Strategy

The platform uses Streamlit's caching decorators to optimize performance:

**Data Caching** - `@st.cache_data(ttl=300)`

* Dataset scanning (expensive directory traversal)
* Image loading and preprocessing
* Visualization generation

**Resource Caching** - `@st.cache_resource`

* Trained model loading (singleton)
* Large data structures
* Database connections

```python theme={null}
@st.cache_data(ttl=300)
def scan_dataset_directory(base_path: str) -> dict[str, Any]:
    """Expensive dataset scanning - cached for 5 minutes"""
    # ...
    
@st.cache_resource
def load_trained_model(model_path: str):
    """Singleton model loading - loaded once per session"""
    return torch.load(model_path)
```

See `.md/arch.md:405-429` for complete patterns.

### GPU Memory Management

* Automatic device detection (CUDA > MPS > CPU)
* Batch size configuration based on available memory
* Model parameter counting before training
* Memory monitoring in sidebar

## Security & Isolation

### Session Isolation

Each user session is isolated with:

* Unique session ID (UUID4)
* Separate file storage directory
* Independent session state
* Isolated experiment tracking

### Thread Safety

Background training threads are isolated from UI thread:

* No shared memory access (uses file I/O)
* Thread registry for pause/stop control
* Daemon threads (auto-cleanup on exit)

```python theme={null}
# Global registry for active training engines
_active_engines: dict[str, TrainingEngine] = {}
_training_threads: dict[str, threading.Thread] = {}
```

See `app/training/worker.py:24-26` for implementation.

## Extensibility

### Adding New Model Architectures

1. Create new file in `app/models/pytorch/`
2. Inherit from `BaseModel`
3. Implement `build()` and `get_parameters_count()`
4. Register in `worker.py:build_model()`

```python theme={null}
class MyCustomModel(BaseModel):
    def build(self) -> nn.Module:
        # Build your model
        return model
    
    def get_parameters_count(self) -> Tuple[int, int]:
        total = sum(p.numel() for p in self.model.parameters())
        trainable = sum(p.numel() for p in self.model.parameters() 
                       if p.requires_grad)
        return total, trainable
```

### Adding New Pages

1. Create folder in `content/` with `page.py` and `view.py`
2. For complex pages, add `tabs/` subfolder
3. Register in `main.py` navigation

```python theme={null}
st.Page("content/my_page/page.py", title="My Page", icon="🆕")
```

## Technology Stack

<CardGroup cols={2}>
  <Card title="Frontend" icon="display">
    * **Streamlit** - Web UI framework
    * **Plotly** - Interactive visualizations
    * **Pillow** - Image processing
  </Card>

  <Card title="Backend" icon="server">
    * **PyTorch** - Deep learning framework
    * **scikit-learn** - Data splitting & metrics
    * **NumPy** - Numerical operations
  </Card>

  <Card title="Models" icon="brain">
    * **torchvision** - Pre-trained models
    * **Custom CNN** - Layer stack builder
    * **Vision Transformer** - From scratch implementation
  </Card>

  <Card title="Infrastructure" icon="gears">
    * **Threading** - Background training
    * **JSON** - Configuration persistence
    * **pathlib** - File system operations
  </Card>
</CardGroup>

## References

* Complete architecture documentation: `app/.md/arch.md`
* Model implementations: `app/models/`
* Training pipeline: `app/training/`
* State management: `app/state/`
