Glossary

May 23

(Beginner-Friendly Reference for Local Compute and AI Concepts)

Compute & Hardware

Compute / Compute Power — The processing ability available to run tasks. More compute = faster results, higher energy cost.

CPU (Central Processing Unit) — General-purpose chip that handles most computer tasks.

GPU (Graphics Processing Unit) — Chip that runs many small calculations in parallel; ideal for AI and image work.

TPU (Tensor Processing Unit) — Specialized chip for AI math, designed by Google; usually accessible only through the cloud.

ASIC (Application-Specific Integrated Circuit) — Custom hardware optimized for one task, such as inference or mining.

GPU Model (e.g., A10G, RTX 4090) — The specific chip version; defines speed, memory (VRAM), and power draw.

VRAM (Video RAM) — Memory on a GPU that stores model data during inference; limits model size.

Instance / Virtual Machine (VM) — A rented computer in the cloud with chosen CPU, GPU, and RAM.

EC2 / Instance Type (e.g., g5.xlarge) — Amazon’s VM service and its preset hardware bundles.

On-Prem (On-Premises) — Hardware physically owned and operated by your organization.

Edge Device — A nearby device (phone, workstation, router, sensor) that processes data locally.

Colocation Facility (Co-lo) — A data center where you rent rack space but manage your own servers.

Container (e.g., Docker) — Lightweight software package that runs consistently across machines.

Kubernetes (K8s) — Software that automates the deployment and management of many containers.

Inference — Using a trained model to generate outputs (answers, images); distinct from training.

Inference Server — Software that exposes a local model as an API for apps to call.

Batch Size — Number of inputs processed at once; affects speed, memory use, and efficiency.

Token — A small chunk of text the model reads or writes; costs and speed scale with token count.

Models & Optimization

Model / Large Language Model (LLM) — Software trained on data to predict or generate language, images, or code.

Open-Source Model — A model whose code and weights are publicly shared for inspection or hosting.

Fine-tuning — Training a pre-built model on specific data to further specialize it.

PEFT (Parameter-Efficient Fine-Tuning) — Techniques that update only small parts of a model to reduce cost.

LoRA (Low-Rank Adaptation) — A popular PEFT method that fine-tunes small “adapter” layers instead of the full model.

QLoRA / QA-LoRA — LoRA combined with quantization to fit large models on limited VRAM.

Quantization — Reducing numeric precision (e.g., 16-bit → 4-bit) to shrink model size and speed up inference.

Quantization-Aware Training (QAT) — Teaching a model to work accurately at low precision during training.

Distillation — Compressing a large “teacher” model into a smaller “student” model that mimics it.

Embeddings — Numeric representations of text or images that capture meaning for search and retrieval.

Checkpoint — A saved snapshot of a model’s weights that can be resumed or shared.

Inference Engine (e.g., ONNX Runtime, TensorRT) — Software optimized to run models efficiently on specific hardware.

Storage & Data Movement

Object Storage (e.g., S3) — Stores files (“objects”); cheap to keep, costs to download.

Block Storage (e.g., EBS) — Disk space attached to a VM; fast while the VM runs.

File Storage (e.g., NFS) — Traditional shared-folder storage used in offices and small servers.

Data Egress — Data leaving the cloud; billed per GB and often the largest hidden cost.

Data Ingress — Data entering the cloud; typically cheaper or free.

Bandwidth — Maximum data transfer capacity of a connection.

Throughput — Actual amount of data transferred over time.

Latency — Delay before data transfer begins; shorter = snappier performance.

Caching — Temporarily storing data close to where it’s used for faster access.

Data Residency / Locality — Where data physically lives; affects laws, latency, and control.

Checksum / Hash — Verification code confirming a file’s integrity during transfer.

Data Lake — Centralized repository for raw, unstructured data.

Networking & Performance

API (Application Programming Interface) — Rules for programs to talk to each other, locally or via cloud.

Proprietary API / SDK — Interfaces owned by one vendor; easy to start, hard to switch away from.

API Gateway — A single entry point that manages and secures multiple APIs.

Serverless — Cloud model where providers run servers automatically; you pay only when used.

Load Balancer — Distributes traffic evenly across servers.

CDN (Content Delivery Network) — Network of cached servers that bring data closer to users.

QoS (Quality of Service) — Prioritizing certain network traffic for steady performance.

WebSocket / Stream — Continuous data connection used for chat, gaming, or live inference.

Burst Scaling — Temporarily adding capacity for spikes in demand, then scaling down.

Cost & Operations

Pay-as-you-go (PAYG) — Billing only for what you use; predictable for small tests, volatile at scale.

Reserved Instance — Discounted VM reserved for 1–3 years; trades flexibility for lower cost.

Spot Instance / Pre-emptible VM — Cheap temporary compute that can be reclaimed anytime.

Total Cost of Ownership (TCO) — All-in cost over time: hardware, energy, cloud fees, support, and staff.

CapEx vs. OpEx — Capital Expenditure (buy hardware) vs. Operating Expenditure (rent cloud).

Cost Calculator (e.g., AWS Calculator, Infracost) — Tools to estimate compute, storage, and transfer spend.

Managed Service — Provider-run platform (e.g., database, ML pipeline); convenient but adds dependency.

Abstraction Layer — Software that hides infrastructure details to simplify use or multi-cloud setups.

Hybrid Architecture — Using both local and cloud resources for balance.

Multi-Cloud — Using multiple cloud providers to avoid lock-in, with added complexity.

Egress Fee — Charge for moving data out of a cloud; often unnoticed until billing.

Elastic IP — Static public IP for a VM; may incur charges when idle.

Energy Footprint — The electricity used by your workloads, local or cloud.

Watt (W) / Kilowatt-hour (kWh) — Instant power vs. energy over time; utility bills use kWh.

Overhead — Indirect costs (maintenance, support, redundancy) not visible in per-use pricing.

SLA (Service Level Agreement) — Provider’s formal uptime and reliability guarantee.

Governance, Privacy & Risk

Digital Sovereignty — Control over where your data lives, who accesses it, and how it’s used.

Compliance / Auditing — Meeting legal or industry rules and keeping records.

Vendor Lock-In — When switching providers is difficult due to the use of proprietary tools or formats.

Attack Surface — All the possible points an attacker could exploit.

Visibility / Observability — Ability to see how systems behave via logs and metrics.

Data Governance — Policies controlling data access and handling.

Zero Trust — Security model that verifies every request instead of assuming safety.

Encryption at Rest / In Transit — Protects stored or transmitted data with encryption.

Federated Learning — Training shared models across many devices without moving data.

Compliance Frameworks (GDPR, HIPAA, ISO 27001) — Standard rule sets for privacy and security.

Data Lineage — Traceable record of where data came from and how it changed.

Tools & Workflows (Beginner-Friendly)

Ollama — Simple tool to download and run local LLMs with minimal setup.

LM Studio — Desktop app for running and testing local models via a GUI.

ComfyUI — Visual workflow builder for image generation and AI pipelines.

RunPod / Lambda Labs / Vast.ai — Hybrid compute marketplaces for transparent GPU rental.

ONNX (Open Neural Network Exchange) — Standard format for sharing models between tools.

Weights & Biases / MLflow — Tools for tracking experiments and model performance.

Prometheus / Grafana — Monitoring and visualization tools for infrastructure metrics.

Community Cloud / Shared Compute — University or co-op servers with transparent governance.

Emerging Concepts (Optional for Further Reading)

Edge-to-Cloud Continuum — The spectrum between device-level and cloud computing.

Digital Twin — A virtual model of a real system for simulation or testing.

Embodied AI — AI integrated with robots or sensors that interact with the physical world.

Lifecycle Assessment (LCA) — Measuring the environmental impact of hardware or software over time.

Compute Credit / Carbon Credit — Units used to offset or account for energy consumption in AI operations.

Quick Cross-References

Thinking “Where do I run this?” → Hybrid Architecture, Latency, TCO

Thinking “How do I customize a model cheaply?” → LoRA, PEFT, Quantization

Thinking, “Why is my bill high?” → PAYG, Data Egress, Managed Service, TCO

Thinking “How do I keep control?” → Digital Sovereignty, Data Residency, Vendor Lock-In, Visibility

Would you like me to make a companion “Practitioner-Level Addendum” (for Lesson 2 and 3 readers) that adds intermediate-level terms such as federated orchestration, local RAG, differential privacy, model cards, and container registry — all in t