Back to Blog
LLM Deployment

From Docker to Multi-Cloud HPC: Securing LLM Workflows for Critical Infrastructure

How a dual-container architecture deploys large language models across AWS, GCP, and Azure for energy grid analytics, with automated security testing, ML-driven insights, and a path to HPC at national laboratories.

Docker Kubernetes FastAPI Multi-Cloud DER Analytics IEEE Published

Large language models are increasingly embedded in operational workflows where failure has consequences. Power grid analytics, clinical decision support, financial compliance. But deploying these models securely across heterogeneous cloud environments remains an open engineering problem. API keys get hard-coded into images. Container base layers ship with unpatched CVEs. Model endpoints sprawl across providers with no unified monitoring.

This work introduces a containerized framework that addresses these problems. The framework deploys LLM-powered analytics across AWS, Google Cloud Platform, and Microsoft Azure simultaneously. It processes Distributed Energy Resource time-series data through a dual-container architecture, integrates multiple LLM providers behind a unified API gateway, and subjects the entire stack to automated security testing.

Architecture

The framework separates concerns into two Docker containers orchestrated via Kubernetes and served through FastAPI with Uvicorn. Isolating data ingestion from model inference creates a clean security boundary and enables independent scaling.

Secure Workflow for LLM/ML Analysis on DER Data
Fig 1. Secure workflow for LLM/ML analysis on DER data showing the dual-container architecture, API endpoint testing, ML analysis pipelines, security testing, and performance metrics layers.

The data_service container ingests raw DER telemetry, aggregates it at configurable time intervals (1-min, 3-min, 5-min), and feeds structured data into four ML analysis pipelines: grid stability analysis using Isolation Forest and Random Forest, K-Means performance optimization, multi-interval temporal comparison, and predictive maintenance through regression and time-series forecasting. The llm_service container orchestrates GPT, Claude, Gemini, and Llama behind a unified gateway, translating ML outputs into natural-language operator recommendations.

Performance

Every component was benchmarked. Resource profiles varied across the model zoo. The anomaly detection pipeline was the most memory-intensive at 1,744 MB. The multi-interval comparison was the leanest. Query-GPT showed the highest CPU usage at 24.39%.

14.2%
CPU Usage (GPT)
91.8%
ML Accuracy
16.4%
Cost Savings (GCP)

The multi-cloud cost analysis across six analytical operations showed GCP consistently delivered the most cost-effective deployment, with per-analysis costs between $0.0051 and $0.0153. Predictive analysis was the most compute-intensive at $8.11 per hour. Cluster analysis was cheapest at $3.67 per hour. AWS and Azure trailed GCP on cost efficiency across every workload.

COST PER ANALYSIS ($) Query GPT Anomaly Detection Cluster Analysis Predictive Analysis Comprehensive ML Aggregate (3-min) AWS / Azure GCP
Fig 2. Cost comparison across cloud providers per analysis type. GCP is cheapest across all six operations.

Security

The framework subjects the containerized stack to a multi-layered assessment using Trivy, scanning base OS components (Debian 12.11) and the Python dependency tree. The scan identified 8 vulnerabilities across 7 CVEs, including issues in libc6, perl-base, zlib1g, setuptools, and starlette. API keys are stored with environment-level isolation. Network policies enforce container-to-container communication boundaries. Runtime monitoring catches behavioral anomalies that static scanning misses.

CONTAINER SCAN Trivy against Debian 12.11 base image + Python packages API TESTING Endpoint validation: /health, /query_gpt, /analyze_data, /metrics NETWORK Container isolation policies, inter-service communication boundaries RUNTIME Behavioral anomaly detection, environment-level key isolation
Fig 3. Four-layer security assessment stack applied to the containerized LLM framework.

Running the Framework

The framework requires Docker, a valid OpenAI API key, and optionally a Kubernetes cluster for multi-cloud orchestration.

Prerequisites

docker --version        # Docker 20.10+
python3 --version       # Python 3.10+

export OPENAI_API_KEY="sk-..."

Build and start the containers

git clone https://github.com/jayapreethi/der_llm_framework.git
cd der_llm_framework

cp .env.example .env
# Add your API keys to .env

docker-compose up --build -d
docker ps

Test the API endpoints

# Health check
curl http://localhost:8000/health

# Query GPT with a DER analysis prompt
curl -X POST http://localhost:8000/query_gpt \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Analyze grid stability for current DER readings"}'

# Run DER data analysis with 1-min aggregation
curl -X POST http://localhost:8000/analyze_data \
  -H "Content-Type: application/json" \
  -d '{"analysis_type": "summary", "interval": "1min"}'

# Fetch performance metrics
curl http://localhost:8000/metrics/table

Deploy to AWS

aws ecr get-login-password --region us-east-1 | \
  docker login --username AWS --password-stdin \
  <account-id>.dkr.ecr.us-east-1.amazonaws.com

docker tag der_llm_data_service:latest \
  <account-id>.dkr.ecr.us-east-1.amazonaws.com/data_service:latest

docker push \
  <account-id>.dkr.ecr.us-east-1.amazonaws.com/data_service:latest

aws logs tail /ecs/der-llm-framework --follow

Security scan

sudo apt-get install -y trivy

trivy image der_llm_data_service:latest
trivy image der_llm_llm_service:latest

ML Analysis Pipelines

The data_service container exposes four analytical capabilities through its API endpoints, each combining classical ML with LLM-guided interpretation.

Grid stability analysis applies a multi-model approach using anomaly detection with Isolation Forest, clustering with K-Means, and predictive classification with Random Forest to monitor frequency, voltage, and power factor. The models achieved 91.8% prediction accuracy and a performance score of 0.92.

Performance optimization uses K-Means clustering on key parameters including power factor, connection window times, and ramp times. Two clusters were identified in the 1-minute interval dataset.

Multi-interval comparison applies time-series analysis across temporal granularities, delivering scores of 0.782 for 1-minute, 0.072 for 3-minute, and 0.807 for 5-minute intervals across frequency, power factor, and voltage parameters.

Predictive maintenance uses regression analysis and time-series forecasting to detect potential equipment failures by analyzing power factor, voltage, and connection status trends.

CPU AND MEMORY UTILIZATION BY MODEL TYPE Query-GPT 24.39% Analyze-Data 13.04% Detect-Anomalies 17.18% Cluster-Analysis 14.53% Predictive-Analysis 15.36% Comprehensive-ML 7.10% Compare-Intervals 7.24% MEMORY (MB) 1744 Detect-Anomalies 1703 Query-GPT 1302 Analyze-Data 793 Compare-Intervals
Fig 4. CPU percentage and memory consumption across analytical model types.

Applications Beyond Energy

The dual-container pattern solves a class of problems common across industries where sensitive operational data meets AI-driven analytics. The architecture is domain-agnostic. What changes between deployments is the data schema in the ingestion container and the prompt templates in the LLM service.

In healthcare, the same separation isolates protected health information inside the data container while the LLM service generates clinical summaries, risk stratification reports, or radiology interpretations without direct access to patient records. The security scanning pipeline validates HIPAA-relevant vulnerabilities before deployment.

In financial services, transaction monitoring and fraud detection alerts route through the data container while the LLM layer generates compliance narratives and audit trail documentation. Multi-cloud deployment enables geographic data residency where EU transaction data stays on EU infrastructure.

In manufacturing and industrial IoT, sensor telemetry from SCADA systems and PLCs flows through the data container with the same configurable time-interval aggregation. The LLM service translates anomaly detection outputs into maintenance work orders and root-cause analysis reports.

In cybersecurity operations, log data and threat intelligence feeds ingest through the data service, route through ML-based anomaly detection, and reach the LLM layer for automated incident report generation. Container isolation prevents lateral movement if the model layer is compromised.

In precision agriculture, drone imagery metadata, soil sensor readings, and weather data aggregate at configurable intervals. The LLM layer generates crop yield predictions, irrigation schedules, and pest risk assessments deployed across regional cloud zones.

The dual-container architecture separates what changes (models, prompts, providers) from what must remain stable (data pipelines, security policies, compliance boundaries), enabling secure LLM deployment across any cloud or HPC environment.

Citation: J. P. Mohan and P. Ranganathan, "Containerized Deployment of Secure LLM Workflows in Multi-Cloud Infrastructures," 2025 IEEE Cloud Summit, DOI: 10.1109/Cloud-Summit64795.2025.00027

This research is funded by the U.S. Department of Energy, Office of Cybersecurity, Energy Security, and Emergency Response (CESER) under contract TL0401010-05907-4219031 (Agreement #51582).

Written by Jaya Preethi Mohan. View the GitHub repository · Back to blog index