C++ Premier

06 June 2018 - 44 mins read time
Tags: c++ pointers memory stack heap RAII concurrency threads mutex semaphore IO templates STL

1. What C++ Actually Is

C++ is a compiled, statically-typed, multi-paradigm language that gives the programmer direct control over hardware resources (memory layout, cache behavior, thread placement) while providing high-level abstractions (classes, templates, lambdas) that cost nothing at runtime when used correctly.

The central philosophy is the zero-overhead abstraction principle: you don’t pay for what you don’t use, and what you do use, you couldn’t hand-code any better. A std::vector is as fast as a manually managed malloc/realloc array — but it also handles copying, moving, resizing, and cleanup automatically.

Translation model: A C++ program is compiled in translation units (.cpp files). Each .cpp is preprocessed (expanding #include, macros), compiled to an object file (.o), and then the linker combines all object files into an executable. Headers (.h/.hpp) contain declarations (telling the compiler “this function/class exists”); source files contain definitions (the actual code). Understanding this model explains most “undefined reference” or “multiple definition” linker errors.

// hello.cpp — minimal program
#include <iostream>      // textual inclusion of I/O declarations
int main() {             // entry point; returns int (0 = success)
    std::cout << "Hello, World!\n";
    return 0;
}

2. The Type System: Why It Matters

C++ is statically typed: every variable’s type is known at compile time, and the compiler enforces type safety (no silent conversion from std::string to int). This is not bureaucracy — it is the compiler catching bugs before the program runs.

2.1 Fundamental Types

Category	Types	Typical Size
Boolean	`bool`	1 byte
Character	`char`, `wchar_t`, `char16_t`, `char32_t`	1 / 2 / 2 / 4 bytes
Integer	`short`, `int`, `long`, `long long`	2 / 4 / 4–8 / 8 bytes
Floating	`float`, `double`, `long double`	4 / 8 / 8–16 bytes
Void	`void`	— (incomplete type)

Use <cstdint> for exact-width types when portability matters: int32_t is always 32 bits, uint64_t is always 64 bits unsigned. Use size_t for sizes and indices — it matches the platform’s pointer width (32-bit on x86, 64-bit on x64).

2.2 Type Qualifiers

const: “I promise not to modify this.” A const int* p means the int can’t be changed through p (but p itself can point elsewhere). An int* const p means p can’t point elsewhere (but the int can change). const int* const p — both frozen. Being const-correct is not optional in professional C++; it documents intent and enables compiler optimizations.
constexpr: Evaluated at compile time. constexpr int factorial(int n) { return n <= 1 ? 1 : n * factorial(n-1); } — the compiler computes factorial(10) and embeds the result as a constant. No runtime cost.
volatile: Tells the compiler “this value may change behind your back” (hardware registers, memory-mapped I/O). Do not use volatile for thread synchronization — it does not provide atomicity or memory ordering. Use std::atomic for that.

2.3 Initialization Styles

int a = 42;        // copy initialization
int b(42);          // direct initialization
int c{42};          // uniform (brace) initialization — PREFERRED
int d = {42};       // copy-list-initialization

Brace initialization {} is preferred because it prevents narrowing conversions: int x{3.14}; is a compile error (truncation), while int x = 3.14; silently truncates to 3. This catches bugs.

3. Pointers: The Most Important C++ Concept

A pointer is a variable that stores a memory address. Pointers are what make C++ powerful (direct hardware access, dynamic data structures, polymorphism) and dangerous (dangling pointers, wild writes, buffer overflows).

3.1 Basics

int x = 42;
int* p = &x;        // p holds the address of x
std::cout << p;     // prints the address (e.g., 0x7ffc3a2b1c04)
std::cout << *p;    // dereferences p → prints 42
*p = 99;            // modifies x through p → x is now 99

&x: “address-of” operator. Returns the memory address where x lives.
*p: “dereference” operator. Follows the address stored in p and accesses the value there.
nullptr: The null pointer literal (C++11). Always initialize pointers: int* p = nullptr;. Dereferencing nullptr is undefined behavior (usually a segfault).

3.2 Pointer Arithmetic

Pointers understand the size of what they point to. If int* p points to arr[0], then p + 1 points to arr[1] (not arr[0] + 1 byte, but arr[0] + sizeof(int) bytes):

int arr[5] = {10, 20, 30, 40, 50};
int* p = arr;           // arr decays to a pointer to its first element
std::cout << *(p + 2);  // prints 30 (arr[2])
std::cout << p[3];      // equivalent to *(p + 3) → prints 40

This is how C-style arrays work: arr[i] is syntactic sugar for *(arr + i). Understanding this equivalence demystifies a lot of C/C++ behavior.

3.3 Pointers and Functions

void swap(int* a, int* b) {
    int temp = *a;
    *a = *b;
    *b = temp;
}
int x = 1, y = 2;
swap(&x, &y);  // x=2, y=1

Passing by pointer (or reference) lets a function modify the caller’s variables. Passing by value creates a copy — the callee works on its own copy, and the caller’s original is unchanged. In C++, prefer references over raw pointers for “I need to modify the caller’s value” because references can’t be null and don’t need dereferencing syntax.

3.4 Pointers to Pointers and Arrays of Pointers

int x = 10;
int* p = &x;
int** pp = &p;         // pointer to pointer
std::cout << **pp;     // 10

const char* names[] = {"Alice", "Bob", "Charlie"};  // array of pointers to strings

char** argv in int main(int argc, char** argv) is an array of pointers to C-strings — this is how command-line arguments arrive.

3.5 Function Pointers and Callbacks

int add(int a, int b) { return a + b; }
int (*op)(int, int) = &add;   // function pointer
std::cout << op(3, 4);        // prints 7

Function pointers enable callbacks — passing a function as an argument to another function. In modern C++, prefer std::function<int(int,int)> or lambdas for cleaner syntax and type safety.

4. References: The Safe Alternative

A reference is an alias — a second name for an existing object. Once bound, it cannot be reseated.

int x = 42;
int& ref = x;        // ref IS x
ref = 99;             // x is now 99

const int& cref = 42; // const ref can bind to a temporary (rvalue)

When to use references vs. pointers:

Reference: when the target must exist (non-null) and doesn’t change (no reseating). Function parameters, return values.
Pointer: when the target may not exist (nullptr), or you need pointer arithmetic, or you need to reseat.
Smart pointer: when you need to express ownership (see Section 6).

4.1 Rvalue References and Move Semantics (C++11)

std::string a = "Hello";
std::string b = std::move(a);  // a is now empty; b owns "Hello"

An rvalue reference (T&&) binds to temporaries (values about to be destroyed). Move semantics transfer ownership of resources (heap memory, file handles) instead of copying them. Moving a std::vector of 1 million elements is $O(1)$ (just swap three pointers) vs. $O(n)$ for copying. This is why returning large objects from functions is efficient in modern C++.

5. Stack vs. Heap: Memory Layout

Understanding where data lives is fundamental to writing correct and fast C++.

5.1 The Stack

The stack is a contiguous block of memory managed automatically by the compiler. Every time a function is called, a stack frame is pushed containing:

Local variables
Function arguments (or references/pointers to them)
Return address

When the function returns, the frame is popped — all local variables are destroyed. This is automatic storage duration.

void foo() {
    int x = 42;         // x lives on the stack
    double arr[100];     // 800 bytes on the stack
}  // x and arr are destroyed here — no manual cleanup

Why the stack is fast: Allocating on the stack is just moving the stack pointer (a single instruction). No bookkeeping, no fragmentation, no system calls. This is why stack allocation is ~100× faster than heap allocation.

The danger: Stack size is limited (typically 1–8 MB). double big[1000000]; (8 MB) will overflow the stack → crash with SIGSEGV. Large or variable-size data belongs on the heap.

5.2 The Heap

The heap (or free store) is a large pool of memory managed by the runtime allocator (malloc/free in C, new/delete in C++). Objects on the heap persist until explicitly freed.

int* p = new int(42);    // allocate 4 bytes on the heap, initialize to 42
// ... p is valid here, even outside the allocating function
delete p;                 // free the memory. MUST be done, or memory leaks.
p = nullptr;              // good practice: prevent dangling pointer use

Why the heap exists: For data whose size is unknown at compile time (user input, network messages), or whose lifetime must exceed the creating function (returning a dynamically built data structure).

Why the heap is dangerous:

Memory leak: Forget delete → memory is never freed → program grows until OOM.
Double free: Call delete twice → undefined behavior (corruption, crash).
Dangling pointer: Use a pointer after delete → reading garbage or crashing.
Fragmentation: Many small allocations/deallocations scatter memory, hurting cache performance.

5.3 Stack vs. Heap: Summary

	Stack	Heap
Speed	Extremely fast (pointer bump)	Slower (allocator bookkeeping)
Size	Small (1–8 MB typical)	Large (limited by OS/RAM)
Lifetime	Automatic (scope-bound)	Manual (until `delete` or RAII)
Fragmentation	None	Possible
Thread safety	Each thread has its own stack	Shared — needs synchronization

Rule of thumb: Default to stack. Use heap only when you need dynamic size, dynamic lifetime, or polymorphism (base-class pointers to derived objects).

6. Dynamic Memory and RAII

6.1 Raw `new` and `delete` (The Old Way)

int* arr = new int[1000];   // allocate array of 1000 ints on the heap
arr[0] = 42;
// ... use arr ...
delete[] arr;                // MUST use delete[] for arrays (not delete)

The problem: if an exception is thrown between new and delete, the delete never runs → leak. If a function has multiple return paths, you must delete before each one. This is error-prone and the #1 source of bugs in legacy C++.

6.2 Smart Pointers (The Modern Way)

RAII (Resource Acquisition Is Initialization): Tie the resource lifetime to an object’s lifetime. When the object is destroyed (goes out of scope), its destructor frees the resource. Smart pointers implement RAII for heap memory:

std::unique_ptr<T> — Sole ownership. Cannot be copied (that would create two owners), but can be moved.

#include <memory>
auto p = std::make_unique<int>(42);  // heap allocation, wrapped in unique_ptr
std::cout << *p;      // use like a regular pointer
// no delete needed — when p goes out of scope, the int is freed

std::shared_ptr<T> — Shared ownership via reference counting. The object is destroyed when the last shared_ptr to it is destroyed.

auto p1 = std::make_shared<std::string>("Hello");
auto p2 = p1;          // ref count = 2
p1.reset();             // ref count = 1, string still alive
// when p2 goes out of scope, ref count = 0, string is freed

std::weak_ptr<T> — Non-owning observer of a shared_ptr. Does not increment the ref count. Used to break reference cycles (e.g., parent-child relationships in trees/graphs where both point to each other).

The Rule: In modern C++ (C++11 and later), never write new or delete directly. Use std::make_unique or std::make_shared. If you see raw new/delete in production code, it is almost certainly a bug waiting to happen.

6.3 RAII Beyond Memory

RAII applies to any resource:

{
    std::lock_guard<std::mutex> lock(mtx);  // acquires mutex
    // ... critical section ...
}  // lock_guard destructor releases mutex — even if an exception is thrown

{
    std::ofstream file("data.txt");  // opens file
    file << "Hello";
}  // ofstream destructor closes file — no fclose() needed

{
    auto conn = db.connect();  // acquires DB connection from pool
    conn.query("SELECT ...");
}  // conn destructor returns connection to pool

This pattern eliminates resource leaks by design. If your class manages a resource, implement the Rule of Five (destructor, copy constructor, copy assignment, move constructor, move assignment) or, better, the Rule of Zero (compose your class from members that already handle their own resources — unique_ptr, string, vector — and the compiler-generated defaults do the right thing).

7. OOP: Classes, Inheritance, and Polymorphism

7.1 Classes and Encapsulation

class BankAccount {
private:
    double balance_;         // invariant: balance_ >= 0
    std::string owner_;

public:
    BankAccount(std::string owner, double initial)
        : owner_(std::move(owner)), balance_(initial) {
        if (initial < 0) throw std::invalid_argument("Negative initial balance");
    }

    void deposit(double amount) {
        if (amount <= 0) throw std::invalid_argument("Non-positive deposit");
        balance_ += amount;
    }

    bool withdraw(double amount) {
        if (amount > balance_) return false;  // insufficient funds
        balance_ -= amount;
        return true;
    }

    double balance() const { return balance_; }  // const: doesn't modify state
};

Encapsulation means: the class owns its invariants. External code cannot set balance_ to $-1000$ because balance_ is private. All mutations go through deposit() and withdraw(), which enforce the rules. This is why we have private — not to hide things, but to guarantee correctness.

7.2 Inheritance and Polymorphism

class Shape {
public:
    virtual double area() const = 0;    // pure virtual → Shape is abstract
    virtual ~Shape() = default;          // MUST be virtual if using base pointers
};

class Circle : public Shape {
    double r_;
public:
    Circle(double r) : r_(r) {}
    double area() const override { return 3.14159 * r_ * r_; }
};

class Rectangle : public Shape {
    double w_, h_;
public:
    Rectangle(double w, double h) : w_(w), h_(h) {}
    double area() const override { return w_ * h_; }
};

// Polymorphic use:
void printArea(const Shape& s) {
    std::cout << s.area() << "\n";  // calls the right version at runtime
}

Virtual dispatch uses a vtable (virtual function table): each polymorphic class has a hidden table of function pointers. When you call s.area(), the runtime looks up the correct function for the actual derived type. This costs one indirection (pointer lookup) per call — negligible in most code, but measurable in tight inner loops (which is why game engines sometimes avoid virtual calls in hot paths).

Always make the base class destructor virtual: If you delete a Shape* that actually points to a Circle, a non-virtual destructor would only call Shape::~Shape(), leaking Circle’s resources.

8. Templates and Generic Programming

Templates are C++’s mechanism for write-once, use-with-any-type code. They are resolved at compile time — no runtime overhead.

template<typename T>
T max_val(T a, T b) {
    return (a > b) ? a : b;
}
// Usage:
max_val(3, 5);           // T = int
max_val(3.14, 2.71);     // T = double
max_val<std::string>("alpha", "beta");  // T = std::string

The compiler generates a separate specialization for each type used. max_val<int> and max_val<double> are two different functions in the binary. This is zero-overhead (no type erasure, no boxing), but can increase binary size if many types are used (code bloat).

8.1 Class Templates

template<typename T>
class Stack {
    std::vector<T> data_;
public:
    void push(const T& val) { data_.push_back(val); }
    T pop() {
        T top = std::move(data_.back());
        data_.pop_back();
        return top;
    }
    bool empty() const { return data_.empty(); }
};

Stack<int> intStack;
Stack<std::string> strStack;

std::vector, std::map, std::unique_ptr — all STL containers and smart pointers are class templates.

8.2 Lambdas (Anonymous Functions)

auto square = [](int x) { return x * x; };
std::cout << square(5);  // 25

std::vector<int> v = {3, 1, 4, 1, 5};
std::sort(v.begin(), v.end(), [](int a, int b) { return a > b; });
// v is now {5, 4, 3, 1, 1}

Capture list []: controls what outer variables the lambda can access.

[=]: capture everything by value (copy).
[&]: capture everything by reference.
[x, &y]: capture x by value, y by reference.
[this]: capture the enclosing object’s this pointer.

Lambdas are the idiomatic way to pass behavior to STL algorithms, thread constructors, and async operations.

9. STL Containers and Algorithms

9.1 Choosing the Right Container

Container	Underlying	Insert	Lookup	Iterate	Use When
`vector`	Dynamic array	$O(1)$ amortized back	$O(n)$	Fastest (contiguous)	Default choice
`deque`	Chunked array	$O(1)$ front/back	$O(n)$	Good	Need front insertion
`list`	Doubly-linked	$O(1)$ anywhere (with iterator)	$O(n)$	Poor cache	Frequent mid-insert/erase
`set`/`map`	Red-black tree	$O(\log n)$	$O(\log n)$	In-order	Sorted access needed
`unordered_set/map`	Hash table	$O(1)$ average	$O(1)$ average	No order	Fast lookup, no ordering

Default to std::vector. Even for 1000 elements, linear search in a vector often beats std::set lookup because contiguous memory is cache-friendly. Only switch to a tree/hash container when profiling shows it matters.

9.2 Algorithms

The <algorithm> header provides ~100 generic algorithms that operate on iterator ranges:

std::vector<int> v = {5, 3, 8, 1, 9};
std::sort(v.begin(), v.end());                  // {1, 3, 5, 8, 9}
auto it = std::find(v.begin(), v.end(), 8);     // iterator to 8
int sum = std::accumulate(v.begin(), v.end(), 0); // 26
std::transform(v.begin(), v.end(), v.begin(),
               [](int x){ return x * 2; });      // {2, 6, 10, 16, 18}
bool all_pos = std::all_of(v.begin(), v.end(),
                           [](int x){ return x > 0; }); // true

Prefer algorithms over raw loops. std::sort is typically an introsort (hybrid quicksort/heapsort/insertion) that’s heavily optimized. Hand-rolling a sort loop is slower, longer, and buggier.

10. I/O: Streams, Files, and Performance

10.1 Stream I/O Basics

#include <iostream>
#include <fstream>
#include <sstream>

// Console
std::cout << "Enter a number: ";
int n;
std::cin >> n;

// File
std::ofstream out("output.txt");
out << "Result: " << n * 2 << "\n";
out.close();  // or let destructor handle it (RAII)

std::ifstream in("output.txt");
std::string line;
while (std::getline(in, line)) {
    std::cout << line << "\n";
}

// String stream (in-memory formatting)
std::ostringstream oss;
oss << "value=" << 42;
std::string s = oss.str();  // "value=42"

10.2 I/O-Bound vs. CPU-Bound Work

This distinction is critical for designing efficient programs:

I/O-Bound: The program spends most of its time waiting — for disk reads, network responses, database queries, user input. The CPU is idle. Adding more CPU cores doesn’t help; you need:

Asynchronous I/O (non-blocking reads/writes, epoll, io_uring on Linux).
Event-driven architecture (single thread with an event loop — Node.js model, or Boost.Asio in C++).
Coroutines (C++20 co_await) to write async code that reads like synchronous code.
Thread pool where many I/O-waiting tasks share a small number of threads.

CPU-Bound: The program spends most of its time computing — matrix multiplication, physics simulation, image processing, compression. Adding more cores helps directly:

Parallel algorithms (std::execution::par in C++17).
Manual threading with careful work partitioning.
SIMD (Single Instruction, Multiple Data — process 4/8/16 values at once via intrinsics or auto-vectorization).
Avoid false sharing (different threads writing to the same cache line).

Practical example — reading and processing a large file:

// BAD: I/O and CPU interleaved — CPU waits for I/O, I/O waits for CPU
for (auto& chunk : file) {
    process(chunk);  // CPU work stalls while next chunk loads
}

// BETTER: overlap I/O and computation with double buffering
// Thread A reads chunk N+1 while Thread B processes chunk N
std::thread reader([&]{ /* read chunks into buffer A/B alternately */ });
std::thread processor([&]{ /* process from buffer A/B alternately */ });

10.3 Buffered I/O Performance Tips

std::ios::sync_with_stdio(false); — Disconnects C++ streams from C stdio. Can speed up cin/cout by 10×. Add std::cin.tie(nullptr); to decouple cin from cout flushing.
Prefer '\n' over std::endl. std::endl flushes the buffer on every line — devastating for bulk output.
For binary files, use std::ios::binary mode and read()/write() with raw buffers instead of formatted <</>>.
Memory-mapped I/O (mmap on Linux) lets you treat a file as an array in memory — the OS handles paging. Fastest for random-access reads of large files.

11. Concurrency: Threads

11.1 Creating Threads

#include <thread>
#include <iostream>

void worker(int id, int iterations) {
    for (int i = 0; i < iterations; ++i) {
        // do work
    }
    std::cout << "Worker " << id << " done\n";
}

int main() {
    std::thread t1(worker, 1, 1000000);
    std::thread t2(worker, 2, 1000000);
    t1.join();  // wait for t1 to finish
    t2.join();  // wait for t2 to finish
}

join() vs detach():

join(): The calling thread blocks until the spawned thread finishes. Always use join() unless you have a very specific reason not to.
detach(): The thread runs independently. When it finishes, its resources are cleaned up. But if main() exits while a detached thread is running, the behavior is undefined. Detaching is rarely the right choice.

Thread with a lambda (common pattern):

int result = 0;
std::thread t([&result]() {
    result = expensive_computation();
});
t.join();
std::cout << result;

11.2 `std::async` and `std::future`

For simpler task-based parallelism:

#include <future>

auto future = std::async(std::launch::async, []() {
    return expensive_computation();
});
// ... do other work while computation runs ...
int result = future.get();  // blocks until result is ready

std::async manages the thread lifecycle for you and propagates exceptions through the future (calling future.get() re-throws any exception from the async task).

11.3 Hardware Concurrency

unsigned int n = std::thread::hardware_concurrency();
// Returns the number of concurrent threads supported (e.g., 8 for a 4-core/8-thread CPU)
// Use this to size your thread pool

12. Synchronization: Mutex, Lock Guards, and Deadlocks

When multiple threads access shared data, data races occur: two threads read/write the same memory location without synchronization → undefined behavior (corrupted data, crashes, impossible debugging).

12.1 `std::mutex`

A mutex (mutual exclusion) is a lock that only one thread can hold at a time:

#include <mutex>

std::mutex mtx;
int shared_counter = 0;

void increment(int n) {
    for (int i = 0; i < n; ++i) {
        mtx.lock();          // acquire the lock — other threads block here
        ++shared_counter;    // only one thread executes this at a time
        mtx.unlock();        // release the lock
    }
}

Problem: If an exception is thrown between lock() and unlock(), the mutex is never released → deadlock (all threads waiting forever).

12.2 `std::lock_guard` and `std::unique_lock` (RAII Locks)

void increment_safe(int n) {
    for (int i = 0; i < n; ++i) {
        std::lock_guard<std::mutex> lock(mtx);  // acquires mutex
        ++shared_counter;
    }  // lock_guard destructor releases mutex — even if exception thrown
}

std::lock_guard: Simple RAII wrapper. Locks in constructor, unlocks in destructor. Cannot be moved or re-locked. Use for simple critical sections.
std::unique_lock: More flexible. Can be deferred-locked, timed, re-locked, or moved. Required for use with condition variables. Slightly more overhead.
std::scoped_lock (C++17): Locks multiple mutexes simultaneously without deadlock risk (uses a deadlock-avoidance algorithm internally):

std::mutex m1, m2;
std::scoped_lock lock(m1, m2);  // locks both — no risk of ABBA deadlock

12.3 Deadlock

Deadlock occurs when two or more threads each hold a lock and wait for the other’s lock:

Thread A: locks m1, then tries to lock m2 → blocks (m2 held by B)
Thread B: locks m2, then tries to lock m1 → blocks (m1 held by A)
Both wait forever.

Prevention:

Lock ordering: Always acquire mutexes in the same global order.
std::scoped_lock: Atomically acquires multiple mutexes.
std::try_lock: Non-blocking attempt — if the lock isn’t available, do something else.
Minimize lock scope: Hold the lock for as little time as possible.

12.4 Reader-Writer Locks

#include <shared_mutex>

std::shared_mutex rw_mtx;

void reader() {
    std::shared_lock lock(rw_mtx);  // multiple readers can hold simultaneously
    // read shared data
}

void writer() {
    std::unique_lock lock(rw_mtx);  // exclusive — blocks until all readers release
    // modify shared data
}

Use when reads vastly outnumber writes (e.g., a configuration map that’s read 1000×/sec and updated 1×/min).

13. Condition Variables: Waiting for Events

A condition variable lets a thread sleep until another thread signals that a condition is met. This is far more efficient than busy-waiting (spinning in a loop checking a flag).

#include <condition_variable>

std::mutex mtx;
std::condition_variable cv;
std::queue<int> work_queue;
bool done = false;

// Producer thread
void producer() {
    for (int i = 0; i < 100; ++i) {
        {
            std::lock_guard<std::mutex> lock(mtx);
            work_queue.push(i);
        }
        cv.notify_one();  // wake one waiting consumer
    }
    {
        std::lock_guard<std::mutex> lock(mtx);
        done = true;
    }
    cv.notify_all();  // wake all consumers so they can exit
}

// Consumer thread
void consumer() {
    while (true) {
        std::unique_lock<std::mutex> lock(mtx);
        cv.wait(lock, [&]{ return !work_queue.empty() || done; });
        // ↑ atomically releases lock and sleeps; re-acquires on wake
        // The predicate lambda handles SPURIOUS WAKEUPS
        if (work_queue.empty() && done) break;
        int item = work_queue.front();
        work_queue.pop();
        lock.unlock();  // release lock before processing
        process(item);
    }
}

Spurious wakeups: The OS may wake a thread even when no notify was called. This is why cv.wait() always takes a predicate (the lambda) — it re-checks the condition after waking. Without the predicate, the consumer would process garbage or crash.

14. Semaphores (C++20)

A semaphore is a counter that controls access to a shared resource. Unlike a mutex (binary: locked or unlocked), a semaphore can allow N concurrent accesses.

#include <semaphore>

// Allow at most 3 threads to access the resource simultaneously
std::counting_semaphore<3> sem(3);

void limited_access() {
    sem.acquire();        // decrement counter; blocks if counter is 0
    // ... at most 3 threads can be here at once ...
    sem.release();        // increment counter; wakes a blocked thread
}

std::binary_semaphore is equivalent to std::counting_semaphore<1> — similar to a mutex, but a semaphore can be released by a different thread than the one that acquired it. This makes semaphores useful for signaling between threads (producer signals consumer), whereas mutexes are strictly for mutual exclusion (lock and unlock must be the same thread).

Pre-C++20 emulation:

class Semaphore {
    std::mutex mtx_;
    std::condition_variable cv_;
    int count_;
public:
    explicit Semaphore(int count) : count_(count) {}
    void acquire() {
        std::unique_lock<std::mutex> lock(mtx_);
        cv_.wait(lock, [this]{ return count_ > 0; });
        --count_;
    }
    void release() {
        std::lock_guard<std::mutex> lock(mtx_);
        ++count_;
        cv_.notify_one();
    }
};

15. Atomics: Lock-Free Programming

For simple shared variables (counters, flags), a full mutex is overkill. std::atomic provides hardware-level atomic operations:

#include <atomic>

std::atomic<int> counter{0};

void increment(int n) {
    for (int i = 0; i < n; ++i) {
        counter.fetch_add(1, std::memory_order_relaxed);
        // or simply: ++counter; (uses seq_cst by default)
    }
}

No lock, no blocking, no deadlock. The CPU uses special instructions (e.g., LOCK XADD on x86) to guarantee atomicity.

Memory ordering controls how operations on different atomics are visible across threads:

memory_order_seq_cst (default): Strongest. All threads see operations in the same order. Easiest to reason about.
memory_order_relaxed: No ordering guarantees between different atomics. Fastest, but only safe for independent counters.
memory_order_acquire / memory_order_release: Synchronize a pair — commonly used for producer-consumer patterns (release in producer, acquire in consumer).

Rule: Use seq_cst (the default) unless profiling shows it’s a bottleneck and you deeply understand the memory model.

16. Thread Pools and Task-Based Parallelism

Creating and destroying threads is expensive (~10–100 µs per thread creation). If you need to run thousands of short tasks, spawning a thread per task wastes time on overhead.

A thread pool creates N threads once and assigns tasks from a queue:

// Simplified thread pool (concept)
class ThreadPool {
    std::vector<std::thread> workers_;
    std::queue<std::function<void()>> tasks_;
    std::mutex mtx_;
    std::condition_variable cv_;
    bool stop_ = false;

public:
    ThreadPool(size_t n) {
        for (size_t i = 0; i < n; ++i) {
            workers_.emplace_back([this] {
                while (true) {
                    std::function<void()> task;
                    {
                        std::unique_lock<std::mutex> lock(mtx_);
                        cv_.wait(lock, [this]{ return stop_ || !tasks_.empty(); });
                        if (stop_ && tasks_.empty()) return;
                        task = std::move(tasks_.front());
                        tasks_.pop();
                    }
                    task();  // execute outside the lock
                }
            });
        }
    }

    void enqueue(std::function<void()> task) {
        {
            std::lock_guard<std::mutex> lock(mtx_);
            tasks_.push(std::move(task));
        }
        cv_.notify_one();
    }

    ~ThreadPool() {
        { std::lock_guard<std::mutex> lock(mtx_); stop_ = true; }
        cv_.notify_all();
        for (auto& w : workers_) w.join();
    }
};

Usage: pool.enqueue([]{ process_image(img); }); — the task runs on the next available thread. No thread creation overhead.

17. Exception Handling

double divide(double a, double b) {
    if (b == 0.0) throw std::invalid_argument("Division by zero");
    return a / b;
}

try {
    double result = divide(10, 0);
} catch (const std::invalid_argument& e) {
    std::cerr << "Error: " << e.what() << "\n";
} catch (const std::exception& e) {
    std::cerr << "Unknown std error: " << e.what() << "\n";
} catch (...) {
    std::cerr << "Unknown error\n";
}

Stack unwinding: When an exception is thrown, the runtime walks up the call stack, destroying local objects (calling destructors) in each frame until a matching catch is found. This is why RAII works with exceptions — all resources held by stack objects are cleaned up automatically.

Guidelines:

Throw by value, catch by const reference.
Only throw for exceptional conditions (out-of-memory, file not found, invalid input). Don’t use exceptions for normal control flow (they are 10–100× slower than a return code on the error path).
Mark functions that never throw noexcept. This enables optimizations (move constructors should be noexcept so vector can use them safely during reallocation).

18. Memory Layout, Alignment, and Cache Awareness

18.1 Struct Layout and Padding

struct Bad {
    char a;       // 1 byte + 3 bytes padding (to align next int)
    int b;        // 4 bytes
    char c;       // 1 byte + 3 bytes padding (to align struct to 4)
};  // sizeof(Bad) = 12

struct Good {
    int b;        // 4 bytes
    char a;       // 1 byte
    char c;       // 1 byte + 2 bytes padding
};  // sizeof(Good) = 8

Rule: Order struct members from largest to smallest alignment to minimize padding.

18.2 Cache-Friendly Programming

Modern CPUs are not memory-bound — they are cache-bound. A cache miss (accessing memory not in L1/L2 cache) costs ~100 cycles; a cache hit costs ~4 cycles. This means:

std::vector » std::list for iteration. A vector’s elements are contiguous → prefetcher loads the next cache line automatically. A linked list’s nodes are scattered across the heap → every access is a cache miss.
Array of Structs (AoS) vs. Struct of Arrays (SoA): If you only access one field of a struct in a hot loop, SoA keeps all values of that field contiguous → better cache utilization.
False sharing: Two threads writing to different variables that happen to share a cache line (64 bytes on x86) cause the cache line to ping-pong between cores. Fix: pad variables to cache line boundary (alignas(64) int counter;).

19. Modern C++ Best Practices (Summary)

Use RAII for all resources. No raw new/delete, no raw fopen/fclose, no raw lock/unlock.
Prefer const and constexpr. Make things immutable by default.
Use smart pointers for heap ownership: unique_ptr (default), shared_ptr (when shared).
Default to std::vector. Switch to other containers only when justified by measurement.
Prefer algorithms over raw loops. std::sort, std::transform, std::find_if.
Use auto for iterator types and complex template return types. Don’t overuse — auto x = 42; is fine, but auto x = compute(); can obscure the type.
Pass by const& for read-only parameters larger than a pointer. Pass by value if you need a copy anyway (the caller can std::move into it).
Mark functions noexcept when they won’t throw (especially move operations and destructors).
Use std::string_view (C++17) for non-owning string references — avoids allocations.
Write thread-safe code by minimizing shared state, not by adding more locks. The safest synchronization is no synchronization (isolated data, message passing).
Profile before optimizing. Use perf, valgrind, gperftools, or Tracy to find actual bottlenecks — they are almost never where you think they are.