Chapter 7: AI Under the Hood

Overview

AI can feel mysterious from the outside, but underneath it is built from math, models, and computing hardware working together. This chapter breaks down the major components that make modern AI systems work.

Understanding these pieces doesn’t require deep technical knowledge — just a sense of how information flows through an AI system to produce the outputs you see.

How AI Systems Work

Neural Networks

Neural networks are computational structures inspired loosely by how neurons connect in the brain. They process information layer by layer, learning patterns over time.

Early neural networks had only a few layers. Today’s systems may have dozens or hundreds, enabling them to recognize complex patterns in text, images, or audio.

Neural networks are the foundation of nearly all modern AI — from voice assistants to recommendation algorithms to large language models.

Transformers

Transformers, introduced in 2017, revolutionized AI by allowing models to understand context across long sequences of text or data.

Unlike older neural networks that processed words one at a time, transformers use a mechanism called attention to look at entire passages at once. This allows them to capture meaning, relationships, and structure with far greater accuracy.

Transformers power today’s most capable systems — including ChatGPT, Claude, Gemini, and many multimodal models.

Models

An AI model is the learned mathematical structure that produces outputs — such as text, images, predictions, or classifications. It is created by training a neural network on large amounts of data.

Models adjust billions (or even trillions) of internal parameters during training. These parameters encode the patterns the model has learned.

Examples of models include:

Large Language Models (LLMs)
Vision models for image classification
Audio models for speech recognition
Multimodal models capable of text + image + audio understanding

Large Language Models (LLMs)

LLMs are a specific type of model trained on massive collections of text. They work by predicting the most likely next word (or token) in a sequence.

This simple mechanism, scaled up across huge datasets and enormous computing power, enables capabilities like conversation, summarization, translation, code generation, and reasoning.

Modern frontier LLMs can also work with images, audio, and video — making them multimodal.

Multimodal AI

Multimodal AI can understand and generate more than one type of data — such as text, images, audio, and video.

This unlocks abilities like:

Describing images
Interpreting charts
Reading handwriting
Turning sketches into designs
Answering questions about videos

Multimodality brings AI closer to understanding the world the way humans do — through multiple senses working together.

Hardware: CPUs, GPUs, TPUs, and NPUs

AI systems run on specialized hardware designed for fast, parallel computation.

CPUs are general-purpose chips that handle everyday computing but are too slow for large-scale AI training.

GPUs (Graphics Processing Units) perform thousands of calculations simultaneously, making them ideal for training neural networks.

TPUs (Tensor Processing Units) and other custom chips are built specifically for AI workloads, accelerating matrix operations used in deep learning.

NPUs (Neural Processing Units) now ship inside laptops and smartphones, enabling on-device AI — faster, more private, and more efficient.

Power & Infrastructure

Modern AI requires enormous computing resources. Training large models means running thousands of GPUs or TPUs for weeks or months.

This consumes significant electricity and cooling, making data centers and energy efficiency key parts of the AI ecosystem.

AI is increasingly shaped by advances in infrastructure — from cloud clusters to distributed training to edge computing.

Autocomplete at Scale

At its core, a large language model is a sophisticated autocomplete engine. It predicts the next token based on everything it has learned during training.

This simple mechanism, scaled with billions of parameters and vast amounts of data, produces the complex behavior we associate with intelligent systems.

AI does not understand meaning the way humans do — but it can generate highly convincing patterns of language based on statistical likelihood.

Key Takeaway

AI’s capabilities come from the combination of neural networks, transformers, hardware acceleration, and massive training datasets. While AI may feel magical, it is ultimately a product of math, computation, and engineering.

Understanding these components helps demystify AI and gives you a clearer sense of how modern systems generate the outputs you see.

End of Chapter 7: AI Under the Hood

Up NextChapter 8: Should I Use AI? →

33% Complete7 of 21 Chapters