Files
ghost/docs/PERFORMANCE_GUIDE.md

198 lines
5.6 KiB
Markdown
Raw Normal View History

2025-11-08 11:48:27 +02:00
# Performance Optimization Guide
## Overview
Ghost is designed for process injection detection with configurable performance characteristics. This guide covers actual optimization strategies and expected performance.
2025-11-08 11:48:27 +02:00
## Performance Characteristics
### Expected Detection Engine Performance
2025-11-08 11:48:27 +02:00
- **Process Enumeration**: 10-50ms for all system processes
- **Memory Region Analysis**: 1-5ms per process (platform-dependent)
- **Thread Enumeration**: 1-10ms per process
- **Detection Heuristics**: <1ms per process
- **Memory Usage**: ~10-20MB for core engine
2025-11-08 11:48:27 +02:00
**Note**: Actual performance varies significantly by:
- Number of processes (100-1000+ typical)
- Memory region count per process
- Thread count per process
- Platform (Windows APIs vs Linux procfs)
2025-11-08 11:48:27 +02:00
### Configuration Options
#### 1. Selective Detection
2025-11-08 11:48:27 +02:00
```rust
use ghost_core::config::DetectionConfig;
// Disable expensive detections for performance
let mut config = DetectionConfig::default();
config.rwx_detection = true; // Fast: O(n) memory regions
config.shellcode_detection = false; // Skip pattern matching
config.hook_detection = false; // Skip module enumeration
config.thread_detection = true; // Moderate: thread enum
config.hollowing_detection = false; // Skip heuristics
2025-11-08 11:48:27 +02:00
```
#### 2. Preset Modes
2025-11-08 11:48:27 +02:00
```rust
// Fast scanning mode
let config = DetectionConfig::performance_mode();
// Thorough scanning mode
let config = DetectionConfig::thorough_mode();
2025-11-08 11:48:27 +02:00
```
#### 3. Process Filtering
2025-11-08 11:48:27 +02:00
```rust
// Skip system processes
config.skip_system_processes = true;
2025-11-08 11:48:27 +02:00
// Limit memory scan size
config.max_memory_scan_size = 10 * 1024 * 1024; // 10MB per process
```
2025-11-08 11:48:27 +02:00
## Performance Considerations
2025-11-08 11:48:27 +02:00
### Platform-Specific Performance
2025-11-08 11:48:27 +02:00
**Windows**:
- CreateToolhelp32Snapshot: Single syscall, fast
- VirtualQueryEx: Iterative, slower for processes with many regions
- ReadProcessMemory: Cross-process, requires proper handles
- NtQueryInformationThread: Undocumented API call per thread
2025-11-08 11:48:27 +02:00
**Linux**:
- /proc enumeration: Directory reads, fast
- /proc/[pid]/maps parsing: File I/O, moderate
- /proc/[pid]/mem reading: Requires ptrace or same user
- /proc/[pid]/task parsing: Per-thread file I/O
2025-11-08 11:48:27 +02:00
**macOS**:
- sysctl KERN_PROC_ALL: Single syscall, fast
- Memory/thread analysis: Not yet implemented
2025-11-08 11:48:27 +02:00
### Running Tests
2025-11-08 11:48:27 +02:00
```bash
# Run all tests including performance assertions
cargo test
2025-11-08 11:48:27 +02:00
# Run tests with timing output
cargo test -- --nocapture
2025-11-08 11:48:27 +02:00
```
## Tuning Guidelines
### For Continuous Monitoring
2025-11-08 11:48:27 +02:00
1. **Adjust scan interval**: Configure `scan_interval_ms` in DetectionConfig
2. **Skip system processes**: Set `skip_system_processes = true`
3. **Limit memory scans**: Reduce `max_memory_scan_size`
4. **Disable heavy detections**: Turn off hook_detection and shellcode_detection
2025-11-08 11:48:27 +02:00
### For One-Time Analysis
2025-11-08 11:48:27 +02:00
1. **Enable all detections**: Use `DetectionConfig::thorough_mode()`
2. **Full memory scanning**: Increase `max_memory_scan_size`
3. **Include system processes**: Set `skip_system_processes = false`
2025-11-08 11:48:27 +02:00
## Platform-Specific Optimizations
### Windows
- Run as Administrator for full process access
- Use `PROCESS_QUERY_LIMITED_INFORMATION` when `PROCESS_QUERY_INFORMATION` fails
- Handle access denied errors gracefully (system processes)
2025-11-08 11:48:27 +02:00
### Linux
- Run with appropriate privileges (root or CAP_SYS_PTRACE)
- Handle permission denied for /proc/[pid]/mem gracefully
- Consider using process groups for batch access
### macOS
- Limited functionality (process enumeration only)
- Most detection features require kernel extensions or Endpoint Security framework
2025-11-08 11:48:27 +02:00
## Troubleshooting Performance Issues
### High CPU Usage
1. Reduce scan frequency (`scan_interval_ms`)
2. Disable thread analysis for each scan
3. Skip memory region enumeration
4. Filter out known-good processes
2025-11-08 11:48:27 +02:00
### High Memory Usage
1. Reduce baseline cache size (limited processes tracked)
2. Clear detection history periodically
3. Limit memory reading buffer sizes
2025-11-08 11:48:27 +02:00
### Slow Detection Response
1. Disable hook detection (expensive module enumeration)
2. Skip shellcode pattern matching
3. Use performance preset mode
2025-11-08 11:48:27 +02:00
## Current Implementation Limits
2025-11-08 11:48:27 +02:00
**What's NOT implemented**:
- No performance metrics collection system
- No Prometheus/monitoring integration
- No SIMD-accelerated pattern matching
- No parallel/async process scanning (single-threaded)
- No LRU caching of results
- No batch processing APIs
2025-11-08 11:48:27 +02:00
**Current architecture**:
- Sequential process scanning
- Simple HashMap for baseline tracking
- Basic confidence scoring
- Manual timer-based intervals (TUI)
2025-11-08 11:48:27 +02:00
## Testing Performance
2025-11-08 11:48:27 +02:00
```rust
#[test]
fn test_detection_performance() {
use std::time::Instant;
2025-11-08 11:48:27 +02:00
let mut engine = DetectionEngine::new().unwrap();
let process = ProcessInfo::new(1234, 4, "test.exe".to_string());
let regions = vec![/* test regions */];
2025-11-08 11:48:27 +02:00
let start = Instant::now();
for _ in 0..100 {
engine.analyze_process(&process, &regions, None);
2025-11-08 11:48:27 +02:00
}
let duration = start.elapsed();
2025-11-08 11:48:27 +02:00
// Should complete 100 analyses in under 100ms
assert!(duration.as_millis() < 100);
2025-11-08 11:48:27 +02:00
}
```
## Best Practices
2025-11-08 11:48:27 +02:00
1. **Start with defaults**: Use `DetectionConfig::default()` initially
2. **Profile specific modules**: Identify which detection is slow
3. **Adjust based on needs**: Disable features you don't need
4. **Handle errors gracefully**: Processes may exit during scan
5. **Test on target hardware**: Performance varies by system
## Future Performance Improvements
Potential enhancements (not yet implemented):
- Parallel process analysis using rayon
- Async I/O for file system operations (Linux)
- Result caching with TTL
- Incremental scanning (only changed processes)
- Memory-mapped file parsing
- SIMD pattern matching for shellcode