The Mechanics of LLMs
Theory, Architecture and Practice for Engineers

Editions (languages)
- Français: French edition
- English: English edition
- Español: Spanish edition
- Português (Brasil): Portuguese (Brazil) edition
- العربية: Arabic translation
Why this book?
As an engineer and Chief Information Officer, the author adopts an architectural and decision-making approach: not just “what a model does”, but “how” and “under what conditions” it integrates into an information system.
Since the emergence of Transformers, artificial intelligence has undergone a major disruption. It is no longer a mysterious black box – it is an understandable engineering architecture.
This book dissects LLMs with the same rigour as a complex IT architecture. No magic promises: principles, equations and executable code, with an explicit IT decision-maker’s perspective.
Overview: 15 progressive chapters
Part I: Fundamentals (Chapters 1-3)
Mathematical and architectural foundations
Ch. 1 – Introduction to Natural Language Processing
- Classic NLP vs modern approaches
- Sequence prediction paradigm
Ch. 2 – Text Representation and Sequential Models
- Tokenisation (BPE, WordPiece, SentencePiece)
- Embeddings and vector representations
- RNN, LSTM, GRU models
Ch. 3 – Transformer Architecture
- Self-attention: formula, intuition, calculations
- Multi-head attention and its benefits
- Normalisation (LayerNorm) and residual connections
Part II: Architecture & Optimisation (Chapters 4-8)
Building and training at scale
Ch. 4 – Transformer-Derived Models
- BERT, GPT, T5: architectures and applications
- Vision Transformers (ViT)
Ch. 5 – Architectural Optimisations
- Linear attention and approximations
- Key-Value Cache and efficient inference
Ch. 6 – Mixture-of-Experts (MoE) Architecture
- Routing algorithms
- Scaling laws with MoE
Ch. 7 – LLM Pre-training
- Pre-training objectives
- Data, tokenisation, and loss functions
- Scaling laws: compute vs data vs model size
Ch. 8 – Training Optimisations
- Gradient checkpointing and activation checkpointing
- Distributed training: DDP, FSDP
- Optimisers: Adam, AdamW, modern variations
Part III: Learning & Alignment (Chapters 9-12)
From raw model to useful assistant
Ch. 9 – Supervised Fine-Tuning (SFT)
- Instruction tuning
- LoRA and QLoRA: parameter reduction
- Resource-efficient fine-tuning
Ch. 10 – Alignment with Human Preferences
- RLHF (Reinforcement Learning from Human Feedback)
- Reward models and their challenges
- Implicit vs explicit preferences
Ch. 11 – Generation and Inference Strategies
- Sampling, Temperature, Top-k, Top-p
- Beam search and guided generation
- Logits processors and constraints
Ch. 12 – Reasoning Models
- Chain-of-Thought (CoT)
- Tree-of-Thought (ToT)
- Self-consistency and majority voting
Part IV: Agentic Ecosystem (Chapters 13-15)
Deployment and autonomous use
Ch. 13 – Augmented Systems and RAG
- Retrieval-Augmented Generation
- Vector databases and similarity search
- Chunking strategies and indexing
Ch. 14 – Standard Agentic Protocols (MCP)
- Model Context Protocol
- Tool calling and function definitions
- Agent loops and orchestration
Ch. 15 – Critical Evaluation of Agentic Flows
- Quality metrics (BLEU, ROUGE, BERTScore)
- Evaluation frameworks
- Limitations and hallucinations
Included resources
9 Executable Python Scripts
All theoretical concepts are illustrated with working code:
- 01_tokenization_embeddings.py — Tokenisation and vectors
- 02_multihead_attention.py — Self-attention in detail
- 03_temperature_softmax.py — Sampling and temperature
- 04_rag_minimal.py — Minimal RAG pipeline
- 05_pass_at_k_evaluation.py — Model evaluation
- 06_react_agent_bonus.py — ReAct agents
- 07_llamaindex_rag_advanced.py — Advanced RAG
- 08_lora_finetuning_example.py — LoRA and fine-tuning
- 09_mini_assistant_complet.py — Integrating mini-assistant
All scripts:
- ✅ Executable without external API (demo/simulation mode)
- ✅ Documented and explained line by line
- ✅ Compatible with Python 3.9+
- ✅ Freely available on GitHub
Book characteristics
| Aspect | Details |
|---|---|
| Author | Mustapha Alouani |
| Pages | 153 pages |
| Chapters | 15 technical chapters |
| Format | 6 × 9 inches |
| Language | French (English edition in preparation) |
| Audience | Engineers, advanced students, technical leaders |
| Prerequisites | Probability, linear algebra, Python practice |
| Level | Intermediate → advanced |
| Status | ✅ Published (2025) |
Who is this book for?
✅ Engineers wanting to understand LLMs beyond an API
✅ Students in computer science, ML, AI: a rigorous resource
✅ Data Scientists transitioning to LLMs
✅ Technical leaders needing to integrate LLMs
✅ Researchers in NLP and ML looking for a reference
✅ Developers curious about what happens “under the hood”
❌ Not recommended for: Readers just looking to “use ChatGPT”
What the reader gains
After reading this book, the reader will be able to:
- Explain how a Transformer really works
- Analyse trade-offs between quality and computational cost
- Justify architectural choices (number of layers, heads, hidden size)
- Evaluate an AI system critically
- Implement key concepts in code
- Argue in a structured way in technical discussions
- Make informed decisions about using LLMs in an information system
How to get the book
Available via the Kindle ecosystem (e-reader, tablet or computer) or in paperback format.
Additional resources
- Code and Scripts: GitHub – Mechanics of LLMs
- Blog: In-depth articles on AI and LLMs
- Newsletter: Technical news and insights
- Back to books: All books →
Author’s note
This book was born from a recurring need observed among technical teams and decision-makers: to understand what is really happening behind a language model API, in order to make informed decisions. It is designed to be read with pen in hand, taking time to follow the reasoning, formulas and code.
It is an engineering book, oriented towards decision-making. It is aimed at those who build systems as well as those who decide on their use.
Mustapha Alouani
English edition available on Amazon (paperback and Kindle).