Infrastructure — GilliganTech — Project Blue Note AI Rhino

🎮

NVIDIA RTX 5090

Blackwell Architecture

32 GB GDDR7 VRAM

21,760 CUDA Cores

1.79 TB/s Memory Bandwidth

575W TDP

The crown jewel. NVIDIA's flagship consumer GPU powers our local AI inference, running Qwen 72B at ~45 tokens/sec with room to spare.

⚡

48-Core CPU

Multi-Threaded Beast

48 Cores / 96 Threads

High Clock Speed

Handles preprocessing, API routing, web serving, and parallel task execution while the GPU focuses on inference.

🧠

120GB DDR5 RAM

Memory for Days

120 GB System Memory

DDR5 High Bandwidth

Enables running massive models with CPU offloading. When VRAM isn't enough, system RAM picks up the slack.

💾

NVMe SSD Array

Flash Storage

TB+ NVMe Storage

7+ GB/s Sequential Read

Model loading times measured in seconds, not minutes. Hot-swapping between AI models is practically instant.

Application Layer

PHP 8.4 React 19 Vite Node.js

AI / Inference Layer

Ollama Qwen 2.5 Llama 3.1 Claude API Gemini API OpenAI API

Services Layer

Apache 2.4 MySQL 8 BIND9 DNS Postfix Dovecot Let's Encrypt

Operating System

Ubuntu Server Virtualmin NVIDIA Drivers CUDA Toolkit

Hardware

RTX 5090 48-Core CPU 120GB RAM NVMe SSD

🌐

DNS (BIND9)

Authoritative nameservers for all venture domains. Full zone management, DNSSEC ready, with automated record management via Virtualmin.

gilligantech.com davegilligan.com bluenotelogic.com gilligan.tech triviaandtunes.no

📧

Email (Postfix + Dovecot)

Self-hosted email infrastructure with full authentication. DKIM signing, SPF records, DMARC policies — email that arrives and stays out of spam.

DKIM SPF DMARC IMAP TLS

🔒

SSL / TLS

Automated SSL certificate management via Let's Encrypt. Every domain secured with HTTPS, auto-renewal, and HSTS headers.

Let's Encrypt Auto-Renewal HSTS A+ Rating

🗄️

Databases (MySQL 8)

MySQL 8 with optimized configurations for both web applications and AI workloads. Connection pooling, query caching, and automated backups.

MySQL 8 InnoDB Query Cache Auto Backup

🌍

Web Server (Apache 2.4)

Apache with PHP-FPM, virtual hosts, mod_rewrite, and optimized for both traditional web applications and AI API proxying.

Apache 2.4 PHP-FPM VHosts mod_rewrite

🤖

AI Inference (Ollama)

Self-hosted AI model serving via Ollama. GPU-accelerated inference with REST API access, model management, and health monitoring.

Ollama REST API GPU Accel Multi-Model

🔐

Data Sovereignty

Sensitive data never leaves our infrastructure. No cloud provider sees our prompts, our code, or our clients' information. Full GDPR compliance by design.

⚡

Zero Latency

Local inference means zero network round-trips. When milliseconds matter — real-time gaming, live AI grading, interactive demos — local wins every time.

💰

Cost Control

After the initial hardware investment, inference is essentially free. No per-token pricing. No surprise bills. No rate limits. Run as much AI as you want.

🎛️

Full Control

Custom models, custom configurations, custom everything. No vendor restrictions on model parameters, system prompts, or usage patterns.

📡

Offline Capable

Internet goes down? Our local AI keeps running. Critical systems never depend on external connectivity. True operational resilience.

🧪

Experimentation

Try new models instantly. Fine-tune parameters without cost concerns. A/B test different configurations. The hardware is always available for exploration.

Bare Metal. Raw Power.

The Machine