Glossary
(Beginner-Friendly Reference for Local Compute and AI Concepts)
Compute & Hardware
Compute / Compute Power — The processing ability available to run tasks. More compute = faster results, higher energy cost.
CPU (Central Processing Unit) — General-purpose chip that handles most computer tasks.
GPU (Graphics Processing Unit) — Chip that runs many small calculations in parallel; ideal for AI and image work.
TPU (Tensor Processing Unit) — Specialized chip for AI math, designed by Google; usually accessible only through the cloud.
ASIC (Application-Specific Integrated Circuit) — Custom hardware optimized for one task, such as inference or mining.
GPU Model (e.g., A10G, RTX 4090) — The specific chip version; defines speed, memory (VRAM), and power draw.
VRAM (Video RAM) — Memory on a GPU that stores model data during inference; limits model size.
Instance / Virtual Machine (VM) — A rented computer in the cloud with chosen CPU, GPU, and RAM.
EC2 / Instance Type (e.g., g5.xlarge) — Amazon’s VM service and its preset hardware bundles.
On-Prem (On-Premises) — Hardware physically owned and operated by your organization.
Edge Device — A nearby device (phone, workstation, router, sensor) that processes data locally.
Colocation Facility (Co-lo) — A data center where you rent rack space but manage your own servers.
Container (e.g., Docker) — Lightweight software package that runs consistently across machines.
Kubernetes (K8s) — Software that automates the deployment and management of many containers.
Inference — Using a trained model to generate outputs (answers, images); distinct from training.
Inference Server — Software that exposes a local model as an API for apps to call.
Batch Size — Number of inputs processed at once; affects speed, memory use, and efficiency.
Token — A small chunk of text the model reads or writes; costs and speed scale with token count.
Models & Optimization
Model / Large Language Model (LLM) — Software trained on data to predict or generate language, images, or code.
Open-Source Model — A model whose code and weights are publicly shared for inspection or hosting.
Fine-tuning — Training a pre-built model on specific data to further specialize it.
PEFT (Parameter-Efficient Fine-Tuning) — Techniques that update only small parts of a model to reduce cost.
LoRA (Low-Rank Adaptation) — A popular PEFT method that fine-tunes small “adapter” layers instead of the full model.
QLoRA / QA-LoRA — LoRA combined with quantization to fit large models on limited VRAM.
Quantization — Reducing numeric precision (e.g., 16-bit → 4-bit) to shrink model size and speed up inference.
Quantization-Aware Training (QAT) — Teaching a model to work accurately at low precision during training.
Distillation — Compressing a large “teacher” model into a smaller “student” model that mimics it.
Embeddings — Numeric representations of text or images that capture meaning for search and retrieval.
Checkpoint — A saved snapshot of a model’s weights that can be resumed or shared.
Inference Engine (e.g., ONNX Runtime, TensorRT) — Software optimized to run models efficiently on specific hardware.
Storage & Data Movement
Object Storage (e.g., S3) — Stores files (“objects”); cheap to keep, costs to download.
Block Storage (e.g., EBS) — Disk space attached to a VM; fast while the VM runs.
File Storage (e.g., NFS) — Traditional shared-folder storage used in offices and small servers.
Data Egress — Data leaving the cloud; billed per GB and often the largest hidden cost.
Data Ingress — Data entering the cloud; typically cheaper or free.
Bandwidth — Maximum data transfer capacity of a connection.
Throughput — Actual amount of data transferred over time.
Latency — Delay before data transfer begins; shorter = snappier performance.
Caching — Temporarily storing data close to where it’s used for faster access.
Data Residency / Locality — Where data physically lives; affects laws, latency, and control.
Checksum / Hash — Verification code confirming a file’s integrity during transfer.
Data Lake — Centralized repository for raw, unstructured data.
Networking & Performance
API (Application Programming Interface) — Rules for programs to talk to each other, locally or via cloud.
Proprietary API / SDK — Interfaces owned by one vendor; easy to start, hard to switch away from.
API Gateway — A single entry point that manages and secures multiple APIs.
Serverless — Cloud model where providers run servers automatically; you pay only when used.
Load Balancer — Distributes traffic evenly across servers.
CDN (Content Delivery Network) — Network of cached servers that bring data closer to users.
QoS (Quality of Service) — Prioritizing certain network traffic for steady performance.
WebSocket / Stream — Continuous data connection used for chat, gaming, or live inference.
Burst Scaling — Temporarily adding capacity for spikes in demand, then scaling down.
Cost & Operations
Pay-as-you-go (PAYG) — Billing only for what you use; predictable for small tests, volatile at scale.
Reserved Instance — Discounted VM reserved for 1–3 years; trades flexibility for lower cost.
Spot Instance / Pre-emptible VM — Cheap temporary compute that can be reclaimed anytime.
Total Cost of Ownership (TCO) — All-in cost over time: hardware, energy, cloud fees, support, and staff.
CapEx vs. OpEx — Capital Expenditure (buy hardware) vs. Operating Expenditure (rent cloud).
Cost Calculator (e.g., AWS Calculator, Infracost) — Tools to estimate compute, storage, and transfer spend.
Managed Service — Provider-run platform (e.g., database, ML pipeline); convenient but adds dependency.
Abstraction Layer — Software that hides infrastructure details to simplify use or multi-cloud setups.
Hybrid Architecture — Using both local and cloud resources for balance.
Multi-Cloud — Using multiple cloud providers to avoid lock-in, with added complexity.
Egress Fee — Charge for moving data out of a cloud; often unnoticed until billing.
Elastic IP — Static public IP for a VM; may incur charges when idle.
Energy Footprint — The electricity used by your workloads, local or cloud.
Watt (W) / Kilowatt-hour (kWh) — Instant power vs. energy over time; utility bills use kWh.
Overhead — Indirect costs (maintenance, support, redundancy) not visible in per-use pricing.
SLA (Service Level Agreement) — Provider’s formal uptime and reliability guarantee.
Governance, Privacy & Risk
Digital Sovereignty — Control over where your data lives, who accesses it, and how it’s used.
Compliance / Auditing — Meeting legal or industry rules and keeping records.
Vendor Lock-In — When switching providers is difficult due to the use of proprietary tools or formats.
Attack Surface — All the possible points an attacker could exploit.
Visibility / Observability — Ability to see how systems behave via logs and metrics.
Data Governance — Policies controlling data access and handling.
Zero Trust — Security model that verifies every request instead of assuming safety.
Encryption at Rest / In Transit — Protects stored or transmitted data with encryption.
Federated Learning — Training shared models across many devices without moving data.
Compliance Frameworks (GDPR, HIPAA, ISO 27001) — Standard rule sets for privacy and security.
Data Lineage — Traceable record of where data came from and how it changed.
Tools & Workflows (Beginner-Friendly)
Ollama — Simple tool to download and run local LLMs with minimal setup.
LM Studio — Desktop app for running and testing local models via a GUI.
ComfyUI — Visual workflow builder for image generation and AI pipelines.
RunPod / Lambda Labs / Vast.ai — Hybrid compute marketplaces for transparent GPU rental.
ONNX (Open Neural Network Exchange) — Standard format for sharing models between tools.
Weights & Biases / MLflow — Tools for tracking experiments and model performance.
Prometheus / Grafana — Monitoring and visualization tools for infrastructure metrics.
Community Cloud / Shared Compute — University or co-op servers with transparent governance.
Emerging Concepts (Optional for Further Reading)
Edge-to-Cloud Continuum — The spectrum between device-level and cloud computing.
Digital Twin — A virtual model of a real system for simulation or testing.
Embodied AI — AI integrated with robots or sensors that interact with the physical world.
Lifecycle Assessment (LCA) — Measuring the environmental impact of hardware or software over time.
Compute Credit / Carbon Credit — Units used to offset or account for energy consumption in AI operations.
Quick Cross-References
Thinking “Where do I run this?” → Hybrid Architecture, Latency, TCO
Thinking “How do I customize a model cheaply?” → LoRA, PEFT, Quantization
Thinking, “Why is my bill high?” → PAYG, Data Egress, Managed Service, TCO
Thinking “How do I keep control?” → Digital Sovereignty, Data Residency, Vendor Lock-In, Visibility
Would you like me to make a companion “Practitioner-Level Addendum” (for Lesson 2 and 3 readers) that adds intermediate-level terms such as federated orchestration, local RAG, differential privacy, model cards, and container registry — all in t