The Mechanics of LLMs

  • Home /
  • The Mechanics of LLMs

The Mechanics of LLMs

Theory, Architecture and Practice for Engineers

The Mechanics of LLMs
Cover: The Mechanics of LLMs

Editions (languages)


Why this book?

As an engineer and Chief Information Officer, the author adopts an architectural and decision-making approach: not just “what a model does”, but “how” and “under what conditions” it integrates into an information system.

Since the emergence of Transformers, artificial intelligence has undergone a major disruption. It is no longer a mysterious black box – it is an understandable engineering architecture.

This book dissects LLMs with the same rigour as a complex IT architecture. No magic promises: principles, equations and executable code, with an explicit IT decision-maker’s perspective.


Overview: 15 progressive chapters

Part I: Fundamentals (Chapters 1-3)

Mathematical and architectural foundations

  • Ch. 1 – Introduction to Natural Language Processing

    • Classic NLP vs modern approaches
    • Sequence prediction paradigm
  • Ch. 2 – Text Representation and Sequential Models

    • Tokenisation (BPE, WordPiece, SentencePiece)
    • Embeddings and vector representations
    • RNN, LSTM, GRU models
  • Ch. 3 – Transformer Architecture

    • Self-attention: formula, intuition, calculations
    • Multi-head attention and its benefits
    • Normalisation (LayerNorm) and residual connections

Part II: Architecture & Optimisation (Chapters 4-8)

Building and training at scale

  • Ch. 4 – Transformer-Derived Models

    • BERT, GPT, T5: architectures and applications
    • Vision Transformers (ViT)
  • Ch. 5 – Architectural Optimisations

    • Linear attention and approximations
    • Key-Value Cache and efficient inference
  • Ch. 6 – Mixture-of-Experts (MoE) Architecture

    • Routing algorithms
    • Scaling laws with MoE
  • Ch. 7 – LLM Pre-training

    • Pre-training objectives
    • Data, tokenisation, and loss functions
    • Scaling laws: compute vs data vs model size
  • Ch. 8 – Training Optimisations

    • Gradient checkpointing and activation checkpointing
    • Distributed training: DDP, FSDP
    • Optimisers: Adam, AdamW, modern variations

Part III: Learning & Alignment (Chapters 9-12)

From raw model to useful assistant

  • Ch. 9 – Supervised Fine-Tuning (SFT)

    • Instruction tuning
    • LoRA and QLoRA: parameter reduction
    • Resource-efficient fine-tuning
  • Ch. 10 – Alignment with Human Preferences

    • RLHF (Reinforcement Learning from Human Feedback)
    • Reward models and their challenges
    • Implicit vs explicit preferences
  • Ch. 11 – Generation and Inference Strategies

    • Sampling, Temperature, Top-k, Top-p
    • Beam search and guided generation
    • Logits processors and constraints
  • Ch. 12 – Reasoning Models

    • Chain-of-Thought (CoT)
    • Tree-of-Thought (ToT)
    • Self-consistency and majority voting

Part IV: Agentic Ecosystem (Chapters 13-15)

Deployment and autonomous use

  • Ch. 13 – Augmented Systems and RAG

    • Retrieval-Augmented Generation
    • Vector databases and similarity search
    • Chunking strategies and indexing
  • Ch. 14 – Standard Agentic Protocols (MCP)

    • Model Context Protocol
    • Tool calling and function definitions
    • Agent loops and orchestration
  • Ch. 15 – Critical Evaluation of Agentic Flows

    • Quality metrics (BLEU, ROUGE, BERTScore)
    • Evaluation frameworks
    • Limitations and hallucinations

Included resources

9 Executable Python Scripts

All theoretical concepts are illustrated with working code:

  • 01_tokenization_embeddings.py — Tokenisation and vectors
  • 02_multihead_attention.py — Self-attention in detail
  • 03_temperature_softmax.py — Sampling and temperature
  • 04_rag_minimal.py — Minimal RAG pipeline
  • 05_pass_at_k_evaluation.py — Model evaluation
  • 06_react_agent_bonus.py — ReAct agents
  • 07_llamaindex_rag_advanced.py — Advanced RAG
  • 08_lora_finetuning_example.py — LoRA and fine-tuning
  • 09_mini_assistant_complet.py — Integrating mini-assistant

All scripts:

  • ✅ Executable without external API (demo/simulation mode)
  • ✅ Documented and explained line by line
  • ✅ Compatible with Python 3.9+
  • ✅ Freely available on GitHub

Book characteristics

AspectDetails
AuthorMustapha Alouani
Pages153 pages
Chapters15 technical chapters
Format6 × 9 inches
LanguageFrench (English edition in preparation)
AudienceEngineers, advanced students, technical leaders
PrerequisitesProbability, linear algebra, Python practice
LevelIntermediate → advanced
Status✅ Published (2025)

Who is this book for?

Engineers wanting to understand LLMs beyond an API
Students in computer science, ML, AI: a rigorous resource
Data Scientists transitioning to LLMs
Technical leaders needing to integrate LLMs
Researchers in NLP and ML looking for a reference
Developers curious about what happens “under the hood”

Not recommended for: Readers just looking to “use ChatGPT”


What the reader gains

After reading this book, the reader will be able to:

  • Explain how a Transformer really works
  • Analyse trade-offs between quality and computational cost
  • Justify architectural choices (number of layers, heads, hidden size)
  • Evaluate an AI system critically
  • Implement key concepts in code
  • Argue in a structured way in technical discussions
  • Make informed decisions about using LLMs in an information system

How to get the book

Available via the Kindle ecosystem (e-reader, tablet or computer) or in paperback format.


Additional resources


Author’s note

This book was born from a recurring need observed among technical teams and decision-makers: to understand what is really happening behind a language model API, in order to make informed decisions. It is designed to be read with pen in hand, taking time to follow the reasoning, formulas and code.

It is an engineering book, oriented towards decision-making. It is aimed at those who build systems as well as those who decide on their use.

Mustapha Alouani


English edition available on Amazon (paperback and Kindle).