C++ Premier
1. What C++ Actually Is
C++ is a compiled, statically-typed, multi-paradigm language that gives the programmer direct control over hardware resources (memory layout, cache behavior, thread placement) while providing high-level abstractions (classes, templates, lambdas) that cost nothing at runtime when used correctly.
The central philosophy is the zero-overhead abstraction principle: you don’t pay for what you don’t use, and what you do use, you couldn’t hand-code any better. A std::vector is as fast as a manually managed malloc/realloc array — but it also handles copying, moving, resizing, and cleanup automatically.
Translation model: A C++ program is compiled in translation units (.cpp files). Each .cpp is preprocessed (expanding #include, macros), compiled to an object file (.o), and then the linker combines all object files into an executable. Headers (.h/.hpp) contain declarations (telling the compiler “this function/class exists”); source files contain definitions (the actual code). Understanding this model explains most “undefined reference” or “multiple definition” linker errors.
// hello.cpp — minimal program
#include <iostream> // textual inclusion of I/O declarations
int main() { // entry point; returns int (0 = success)
std::cout << "Hello, World!\n";
return 0;
}
2. The Type System: Why It Matters
C++ is statically typed: every variable’s type is known at compile time, and the compiler enforces type safety (no silent conversion from std::string to int). This is not bureaucracy — it is the compiler catching bugs before the program runs.
2.1 Fundamental Types
| Category | Types | Typical Size |
|---|---|---|
| Boolean | bool |
1 byte |
| Character | char, wchar_t, char16_t, char32_t |
1 / 2 / 2 / 4 bytes |
| Integer | short, int, long, long long |
2 / 4 / 4–8 / 8 bytes |
| Floating | float, double, long double |
4 / 8 / 8–16 bytes |
| Void | void |
— (incomplete type) |
Use <cstdint> for exact-width types when portability matters: int32_t is always 32 bits, uint64_t is always 64 bits unsigned. Use size_t for sizes and indices — it matches the platform’s pointer width (32-bit on x86, 64-bit on x64).
2.2 Type Qualifiers
const: “I promise not to modify this.” Aconst int* pmeans theintcan’t be changed throughp(butpitself can point elsewhere). Anint* const pmeanspcan’t point elsewhere (but theintcan change).const int* const p— both frozen. Beingconst-correct is not optional in professional C++; it documents intent and enables compiler optimizations.constexpr: Evaluated at compile time.constexpr int factorial(int n) { return n <= 1 ? 1 : n * factorial(n-1); }— the compiler computesfactorial(10)and embeds the result as a constant. No runtime cost.volatile: Tells the compiler “this value may change behind your back” (hardware registers, memory-mapped I/O). Do not usevolatilefor thread synchronization — it does not provide atomicity or memory ordering. Usestd::atomicfor that.
2.3 Initialization Styles
int a = 42; // copy initialization
int b(42); // direct initialization
int c{42}; // uniform (brace) initialization — PREFERRED
int d = {42}; // copy-list-initialization
Brace initialization {} is preferred because it prevents narrowing conversions: int x{3.14}; is a compile error (truncation), while int x = 3.14; silently truncates to 3. This catches bugs.
3. Pointers: The Most Important C++ Concept
A pointer is a variable that stores a memory address. Pointers are what make C++ powerful (direct hardware access, dynamic data structures, polymorphism) and dangerous (dangling pointers, wild writes, buffer overflows).
3.1 Basics
int x = 42;
int* p = &x; // p holds the address of x
std::cout << p; // prints the address (e.g., 0x7ffc3a2b1c04)
std::cout << *p; // dereferences p → prints 42
*p = 99; // modifies x through p → x is now 99
&x: “address-of” operator. Returns the memory address wherexlives.*p: “dereference” operator. Follows the address stored inpand accesses the value there.nullptr: The null pointer literal (C++11). Always initialize pointers:int* p = nullptr;. Dereferencingnullptris undefined behavior (usually a segfault).
3.2 Pointer Arithmetic
Pointers understand the size of what they point to. If int* p points to arr[0], then p + 1 points to arr[1] (not arr[0] + 1 byte, but arr[0] + sizeof(int) bytes):
int arr[5] = {10, 20, 30, 40, 50};
int* p = arr; // arr decays to a pointer to its first element
std::cout << *(p + 2); // prints 30 (arr[2])
std::cout << p[3]; // equivalent to *(p + 3) → prints 40
This is how C-style arrays work: arr[i] is syntactic sugar for *(arr + i). Understanding this equivalence demystifies a lot of C/C++ behavior.
3.3 Pointers and Functions
void swap(int* a, int* b) {
int temp = *a;
*a = *b;
*b = temp;
}
int x = 1, y = 2;
swap(&x, &y); // x=2, y=1
Passing by pointer (or reference) lets a function modify the caller’s variables. Passing by value creates a copy — the callee works on its own copy, and the caller’s original is unchanged. In C++, prefer references over raw pointers for “I need to modify the caller’s value” because references can’t be null and don’t need dereferencing syntax.
3.4 Pointers to Pointers and Arrays of Pointers
int x = 10;
int* p = &x;
int** pp = &p; // pointer to pointer
std::cout << **pp; // 10
const char* names[] = {"Alice", "Bob", "Charlie"}; // array of pointers to strings
char** argv in int main(int argc, char** argv) is an array of pointers to C-strings — this is how command-line arguments arrive.
3.5 Function Pointers and Callbacks
int add(int a, int b) { return a + b; }
int (*op)(int, int) = &add; // function pointer
std::cout << op(3, 4); // prints 7
Function pointers enable callbacks — passing a function as an argument to another function. In modern C++, prefer std::function<int(int,int)> or lambdas for cleaner syntax and type safety.
4. References: The Safe Alternative
A reference is an alias — a second name for an existing object. Once bound, it cannot be reseated.
int x = 42;
int& ref = x; // ref IS x
ref = 99; // x is now 99
const int& cref = 42; // const ref can bind to a temporary (rvalue)
When to use references vs. pointers:
- Reference: when the target must exist (non-null) and doesn’t change (no reseating). Function parameters, return values.
- Pointer: when the target may not exist (
nullptr), or you need pointer arithmetic, or you need to reseat. - Smart pointer: when you need to express ownership (see Section 6).
4.1 Rvalue References and Move Semantics (C++11)
std::string a = "Hello";
std::string b = std::move(a); // a is now empty; b owns "Hello"
An rvalue reference (T&&) binds to temporaries (values about to be destroyed). Move semantics transfer ownership of resources (heap memory, file handles) instead of copying them. Moving a std::vector of 1 million elements is $O(1)$ (just swap three pointers) vs. $O(n)$ for copying. This is why returning large objects from functions is efficient in modern C++.
5. Stack vs. Heap: Memory Layout
Understanding where data lives is fundamental to writing correct and fast C++.
5.1 The Stack
The stack is a contiguous block of memory managed automatically by the compiler. Every time a function is called, a stack frame is pushed containing:
- Local variables
- Function arguments (or references/pointers to them)
- Return address
When the function returns, the frame is popped — all local variables are destroyed. This is automatic storage duration.
void foo() {
int x = 42; // x lives on the stack
double arr[100]; // 800 bytes on the stack
} // x and arr are destroyed here — no manual cleanup
Why the stack is fast: Allocating on the stack is just moving the stack pointer (a single instruction). No bookkeeping, no fragmentation, no system calls. This is why stack allocation is ~100× faster than heap allocation.
The danger: Stack size is limited (typically 1–8 MB). double big[1000000]; (8 MB) will overflow the stack → crash with SIGSEGV. Large or variable-size data belongs on the heap.
5.2 The Heap
The heap (or free store) is a large pool of memory managed by the runtime allocator (malloc/free in C, new/delete in C++). Objects on the heap persist until explicitly freed.
int* p = new int(42); // allocate 4 bytes on the heap, initialize to 42
// ... p is valid here, even outside the allocating function
delete p; // free the memory. MUST be done, or memory leaks.
p = nullptr; // good practice: prevent dangling pointer use
Why the heap exists: For data whose size is unknown at compile time (user input, network messages), or whose lifetime must exceed the creating function (returning a dynamically built data structure).
Why the heap is dangerous:
- Memory leak: Forget
delete→ memory is never freed → program grows until OOM. - Double free: Call
deletetwice → undefined behavior (corruption, crash). - Dangling pointer: Use a pointer after
delete→ reading garbage or crashing. - Fragmentation: Many small allocations/deallocations scatter memory, hurting cache performance.
5.3 Stack vs. Heap: Summary
| Stack | Heap | |
|---|---|---|
| Speed | Extremely fast (pointer bump) | Slower (allocator bookkeeping) |
| Size | Small (1–8 MB typical) | Large (limited by OS/RAM) |
| Lifetime | Automatic (scope-bound) | Manual (until delete or RAII) |
| Fragmentation | None | Possible |
| Thread safety | Each thread has its own stack | Shared — needs synchronization |
Rule of thumb: Default to stack. Use heap only when you need dynamic size, dynamic lifetime, or polymorphism (base-class pointers to derived objects).
6. Dynamic Memory and RAII
6.1 Raw new and delete (The Old Way)
int* arr = new int[1000]; // allocate array of 1000 ints on the heap
arr[0] = 42;
// ... use arr ...
delete[] arr; // MUST use delete[] for arrays (not delete)
The problem: if an exception is thrown between new and delete, the delete never runs → leak. If a function has multiple return paths, you must delete before each one. This is error-prone and the #1 source of bugs in legacy C++.
6.2 Smart Pointers (The Modern Way)
RAII (Resource Acquisition Is Initialization): Tie the resource lifetime to an object’s lifetime. When the object is destroyed (goes out of scope), its destructor frees the resource. Smart pointers implement RAII for heap memory:
std::unique_ptr<T> — Sole ownership. Cannot be copied (that would create two owners), but can be moved.
#include <memory>
auto p = std::make_unique<int>(42); // heap allocation, wrapped in unique_ptr
std::cout << *p; // use like a regular pointer
// no delete needed — when p goes out of scope, the int is freed
std::shared_ptr<T> — Shared ownership via reference counting. The object is destroyed when the last shared_ptr to it is destroyed.
auto p1 = std::make_shared<std::string>("Hello");
auto p2 = p1; // ref count = 2
p1.reset(); // ref count = 1, string still alive
// when p2 goes out of scope, ref count = 0, string is freed
std::weak_ptr<T> — Non-owning observer of a shared_ptr. Does not increment the ref count. Used to break reference cycles (e.g., parent-child relationships in trees/graphs where both point to each other).
The Rule: In modern C++ (C++11 and later), never write new or delete directly. Use std::make_unique or std::make_shared. If you see raw new/delete in production code, it is almost certainly a bug waiting to happen.
6.3 RAII Beyond Memory
RAII applies to any resource:
{
std::lock_guard<std::mutex> lock(mtx); // acquires mutex
// ... critical section ...
} // lock_guard destructor releases mutex — even if an exception is thrown
{
std::ofstream file("data.txt"); // opens file
file << "Hello";
} // ofstream destructor closes file — no fclose() needed
{
auto conn = db.connect(); // acquires DB connection from pool
conn.query("SELECT ...");
} // conn destructor returns connection to pool
This pattern eliminates resource leaks by design. If your class manages a resource, implement the Rule of Five (destructor, copy constructor, copy assignment, move constructor, move assignment) or, better, the Rule of Zero (compose your class from members that already handle their own resources — unique_ptr, string, vector — and the compiler-generated defaults do the right thing).
7. OOP: Classes, Inheritance, and Polymorphism
7.1 Classes and Encapsulation
class BankAccount {
private:
double balance_; // invariant: balance_ >= 0
std::string owner_;
public:
BankAccount(std::string owner, double initial)
: owner_(std::move(owner)), balance_(initial) {
if (initial < 0) throw std::invalid_argument("Negative initial balance");
}
void deposit(double amount) {
if (amount <= 0) throw std::invalid_argument("Non-positive deposit");
balance_ += amount;
}
bool withdraw(double amount) {
if (amount > balance_) return false; // insufficient funds
balance_ -= amount;
return true;
}
double balance() const { return balance_; } // const: doesn't modify state
};
Encapsulation means: the class owns its invariants. External code cannot set balance_ to $-1000$ because balance_ is private. All mutations go through deposit() and withdraw(), which enforce the rules. This is why we have private — not to hide things, but to guarantee correctness.
7.2 Inheritance and Polymorphism
class Shape {
public:
virtual double area() const = 0; // pure virtual → Shape is abstract
virtual ~Shape() = default; // MUST be virtual if using base pointers
};
class Circle : public Shape {
double r_;
public:
Circle(double r) : r_(r) {}
double area() const override { return 3.14159 * r_ * r_; }
};
class Rectangle : public Shape {
double w_, h_;
public:
Rectangle(double w, double h) : w_(w), h_(h) {}
double area() const override { return w_ * h_; }
};
// Polymorphic use:
void printArea(const Shape& s) {
std::cout << s.area() << "\n"; // calls the right version at runtime
}
Virtual dispatch uses a vtable (virtual function table): each polymorphic class has a hidden table of function pointers. When you call s.area(), the runtime looks up the correct function for the actual derived type. This costs one indirection (pointer lookup) per call — negligible in most code, but measurable in tight inner loops (which is why game engines sometimes avoid virtual calls in hot paths).
Always make the base class destructor virtual: If you delete a Shape* that actually points to a Circle, a non-virtual destructor would only call Shape::~Shape(), leaking Circle’s resources.
8. Templates and Generic Programming
Templates are C++’s mechanism for write-once, use-with-any-type code. They are resolved at compile time — no runtime overhead.
template<typename T>
T max_val(T a, T b) {
return (a > b) ? a : b;
}
// Usage:
max_val(3, 5); // T = int
max_val(3.14, 2.71); // T = double
max_val<std::string>("alpha", "beta"); // T = std::string
The compiler generates a separate specialization for each type used. max_val<int> and max_val<double> are two different functions in the binary. This is zero-overhead (no type erasure, no boxing), but can increase binary size if many types are used (code bloat).
8.1 Class Templates
template<typename T>
class Stack {
std::vector<T> data_;
public:
void push(const T& val) { data_.push_back(val); }
T pop() {
T top = std::move(data_.back());
data_.pop_back();
return top;
}
bool empty() const { return data_.empty(); }
};
Stack<int> intStack;
Stack<std::string> strStack;
std::vector, std::map, std::unique_ptr — all STL containers and smart pointers are class templates.
8.2 Lambdas (Anonymous Functions)
auto square = [](int x) { return x * x; };
std::cout << square(5); // 25
std::vector<int> v = {3, 1, 4, 1, 5};
std::sort(v.begin(), v.end(), [](int a, int b) { return a > b; });
// v is now {5, 4, 3, 1, 1}
Capture list []: controls what outer variables the lambda can access.
[=]: capture everything by value (copy).[&]: capture everything by reference.[x, &y]: capturexby value,yby reference.[this]: capture the enclosing object’sthispointer.
Lambdas are the idiomatic way to pass behavior to STL algorithms, thread constructors, and async operations.
9. STL Containers and Algorithms
9.1 Choosing the Right Container
| Container | Underlying | Insert | Lookup | Iterate | Use When |
|---|---|---|---|---|---|
vector |
Dynamic array | $O(1)$ amortized back | $O(n)$ | Fastest (contiguous) | Default choice |
deque |
Chunked array | $O(1)$ front/back | $O(n)$ | Good | Need front insertion |
list |
Doubly-linked | $O(1)$ anywhere (with iterator) | $O(n)$ | Poor cache | Frequent mid-insert/erase |
set/map |
Red-black tree | $O(\log n)$ | $O(\log n)$ | In-order | Sorted access needed |
unordered_set/map |
Hash table | $O(1)$ average | $O(1)$ average | No order | Fast lookup, no ordering |
Default to std::vector. Even for 1000 elements, linear search in a vector often beats std::set lookup because contiguous memory is cache-friendly. Only switch to a tree/hash container when profiling shows it matters.
9.2 Algorithms
The <algorithm> header provides ~100 generic algorithms that operate on iterator ranges:
std::vector<int> v = {5, 3, 8, 1, 9};
std::sort(v.begin(), v.end()); // {1, 3, 5, 8, 9}
auto it = std::find(v.begin(), v.end(), 8); // iterator to 8
int sum = std::accumulate(v.begin(), v.end(), 0); // 26
std::transform(v.begin(), v.end(), v.begin(),
[](int x){ return x * 2; }); // {2, 6, 10, 16, 18}
bool all_pos = std::all_of(v.begin(), v.end(),
[](int x){ return x > 0; }); // true
Prefer algorithms over raw loops. std::sort is typically an introsort (hybrid quicksort/heapsort/insertion) that’s heavily optimized. Hand-rolling a sort loop is slower, longer, and buggier.
10. I/O: Streams, Files, and Performance
10.1 Stream I/O Basics
#include <iostream>
#include <fstream>
#include <sstream>
// Console
std::cout << "Enter a number: ";
int n;
std::cin >> n;
// File
std::ofstream out("output.txt");
out << "Result: " << n * 2 << "\n";
out.close(); // or let destructor handle it (RAII)
std::ifstream in("output.txt");
std::string line;
while (std::getline(in, line)) {
std::cout << line << "\n";
}
// String stream (in-memory formatting)
std::ostringstream oss;
oss << "value=" << 42;
std::string s = oss.str(); // "value=42"
10.2 I/O-Bound vs. CPU-Bound Work
This distinction is critical for designing efficient programs:
I/O-Bound: The program spends most of its time waiting — for disk reads, network responses, database queries, user input. The CPU is idle. Adding more CPU cores doesn’t help; you need:
- Asynchronous I/O (non-blocking reads/writes,
epoll,io_uringon Linux). - Event-driven architecture (single thread with an event loop — Node.js model, or Boost.Asio in C++).
- Coroutines (C++20
co_await) to write async code that reads like synchronous code. - Thread pool where many I/O-waiting tasks share a small number of threads.
CPU-Bound: The program spends most of its time computing — matrix multiplication, physics simulation, image processing, compression. Adding more cores helps directly:
- Parallel algorithms (
std::execution::parin C++17). - Manual threading with careful work partitioning.
- SIMD (Single Instruction, Multiple Data — process 4/8/16 values at once via intrinsics or auto-vectorization).
- Avoid false sharing (different threads writing to the same cache line).
Practical example — reading and processing a large file:
// BAD: I/O and CPU interleaved — CPU waits for I/O, I/O waits for CPU
for (auto& chunk : file) {
process(chunk); // CPU work stalls while next chunk loads
}
// BETTER: overlap I/O and computation with double buffering
// Thread A reads chunk N+1 while Thread B processes chunk N
std::thread reader([&]{ /* read chunks into buffer A/B alternately */ });
std::thread processor([&]{ /* process from buffer A/B alternately */ });
10.3 Buffered I/O Performance Tips
std::ios::sync_with_stdio(false);— Disconnects C++ streams from C stdio. Can speed upcin/coutby 10×. Addstd::cin.tie(nullptr);to decouplecinfromcoutflushing.- Prefer
'\n'overstd::endl.std::endlflushes the buffer on every line — devastating for bulk output. - For binary files, use
std::ios::binarymode andread()/write()with raw buffers instead of formatted<</>>. - Memory-mapped I/O (
mmapon Linux) lets you treat a file as an array in memory — the OS handles paging. Fastest for random-access reads of large files.
11. Concurrency: Threads
11.1 Creating Threads
#include <thread>
#include <iostream>
void worker(int id, int iterations) {
for (int i = 0; i < iterations; ++i) {
// do work
}
std::cout << "Worker " << id << " done\n";
}
int main() {
std::thread t1(worker, 1, 1000000);
std::thread t2(worker, 2, 1000000);
t1.join(); // wait for t1 to finish
t2.join(); // wait for t2 to finish
}
join() vs detach():
join(): The calling thread blocks until the spawned thread finishes. Always usejoin()unless you have a very specific reason not to.detach(): The thread runs independently. When it finishes, its resources are cleaned up. But ifmain()exits while a detached thread is running, the behavior is undefined. Detaching is rarely the right choice.
Thread with a lambda (common pattern):
int result = 0;
std::thread t([&result]() {
result = expensive_computation();
});
t.join();
std::cout << result;
11.2 std::async and std::future
For simpler task-based parallelism:
#include <future>
auto future = std::async(std::launch::async, []() {
return expensive_computation();
});
// ... do other work while computation runs ...
int result = future.get(); // blocks until result is ready
std::async manages the thread lifecycle for you and propagates exceptions through the future (calling future.get() re-throws any exception from the async task).
11.3 Hardware Concurrency
unsigned int n = std::thread::hardware_concurrency();
// Returns the number of concurrent threads supported (e.g., 8 for a 4-core/8-thread CPU)
// Use this to size your thread pool
12. Synchronization: Mutex, Lock Guards, and Deadlocks
When multiple threads access shared data, data races occur: two threads read/write the same memory location without synchronization → undefined behavior (corrupted data, crashes, impossible debugging).
12.1 std::mutex
A mutex (mutual exclusion) is a lock that only one thread can hold at a time:
#include <mutex>
std::mutex mtx;
int shared_counter = 0;
void increment(int n) {
for (int i = 0; i < n; ++i) {
mtx.lock(); // acquire the lock — other threads block here
++shared_counter; // only one thread executes this at a time
mtx.unlock(); // release the lock
}
}
Problem: If an exception is thrown between lock() and unlock(), the mutex is never released → deadlock (all threads waiting forever).
12.2 std::lock_guard and std::unique_lock (RAII Locks)
void increment_safe(int n) {
for (int i = 0; i < n; ++i) {
std::lock_guard<std::mutex> lock(mtx); // acquires mutex
++shared_counter;
} // lock_guard destructor releases mutex — even if exception thrown
}
std::lock_guard: Simple RAII wrapper. Locks in constructor, unlocks in destructor. Cannot be moved or re-locked. Use for simple critical sections.std::unique_lock: More flexible. Can be deferred-locked, timed, re-locked, or moved. Required for use with condition variables. Slightly more overhead.std::scoped_lock(C++17): Locks multiple mutexes simultaneously without deadlock risk (uses a deadlock-avoidance algorithm internally):
std::mutex m1, m2;
std::scoped_lock lock(m1, m2); // locks both — no risk of ABBA deadlock
12.3 Deadlock
Deadlock occurs when two or more threads each hold a lock and wait for the other’s lock:
Thread A: locks m1, then tries to lock m2 → blocks (m2 held by B)
Thread B: locks m2, then tries to lock m1 → blocks (m1 held by A)
Both wait forever.
Prevention:
- Lock ordering: Always acquire mutexes in the same global order.
std::scoped_lock: Atomically acquires multiple mutexes.std::try_lock: Non-blocking attempt — if the lock isn’t available, do something else.- Minimize lock scope: Hold the lock for as little time as possible.
12.4 Reader-Writer Locks
#include <shared_mutex>
std::shared_mutex rw_mtx;
void reader() {
std::shared_lock lock(rw_mtx); // multiple readers can hold simultaneously
// read shared data
}
void writer() {
std::unique_lock lock(rw_mtx); // exclusive — blocks until all readers release
// modify shared data
}
Use when reads vastly outnumber writes (e.g., a configuration map that’s read 1000×/sec and updated 1×/min).
13. Condition Variables: Waiting for Events
A condition variable lets a thread sleep until another thread signals that a condition is met. This is far more efficient than busy-waiting (spinning in a loop checking a flag).
#include <condition_variable>
std::mutex mtx;
std::condition_variable cv;
std::queue<int> work_queue;
bool done = false;
// Producer thread
void producer() {
for (int i = 0; i < 100; ++i) {
{
std::lock_guard<std::mutex> lock(mtx);
work_queue.push(i);
}
cv.notify_one(); // wake one waiting consumer
}
{
std::lock_guard<std::mutex> lock(mtx);
done = true;
}
cv.notify_all(); // wake all consumers so they can exit
}
// Consumer thread
void consumer() {
while (true) {
std::unique_lock<std::mutex> lock(mtx);
cv.wait(lock, [&]{ return !work_queue.empty() || done; });
// ↑ atomically releases lock and sleeps; re-acquires on wake
// The predicate lambda handles SPURIOUS WAKEUPS
if (work_queue.empty() && done) break;
int item = work_queue.front();
work_queue.pop();
lock.unlock(); // release lock before processing
process(item);
}
}
Spurious wakeups: The OS may wake a thread even when no notify was called. This is why cv.wait() always takes a predicate (the lambda) — it re-checks the condition after waking. Without the predicate, the consumer would process garbage or crash.
14. Semaphores (C++20)
A semaphore is a counter that controls access to a shared resource. Unlike a mutex (binary: locked or unlocked), a semaphore can allow N concurrent accesses.
#include <semaphore>
// Allow at most 3 threads to access the resource simultaneously
std::counting_semaphore<3> sem(3);
void limited_access() {
sem.acquire(); // decrement counter; blocks if counter is 0
// ... at most 3 threads can be here at once ...
sem.release(); // increment counter; wakes a blocked thread
}
std::binary_semaphore is equivalent to std::counting_semaphore<1> — similar to a mutex, but a semaphore can be released by a different thread than the one that acquired it. This makes semaphores useful for signaling between threads (producer signals consumer), whereas mutexes are strictly for mutual exclusion (lock and unlock must be the same thread).
Pre-C++20 emulation:
class Semaphore {
std::mutex mtx_;
std::condition_variable cv_;
int count_;
public:
explicit Semaphore(int count) : count_(count) {}
void acquire() {
std::unique_lock<std::mutex> lock(mtx_);
cv_.wait(lock, [this]{ return count_ > 0; });
--count_;
}
void release() {
std::lock_guard<std::mutex> lock(mtx_);
++count_;
cv_.notify_one();
}
};
15. Atomics: Lock-Free Programming
For simple shared variables (counters, flags), a full mutex is overkill. std::atomic provides hardware-level atomic operations:
#include <atomic>
std::atomic<int> counter{0};
void increment(int n) {
for (int i = 0; i < n; ++i) {
counter.fetch_add(1, std::memory_order_relaxed);
// or simply: ++counter; (uses seq_cst by default)
}
}
No lock, no blocking, no deadlock. The CPU uses special instructions (e.g., LOCK XADD on x86) to guarantee atomicity.
Memory ordering controls how operations on different atomics are visible across threads:
memory_order_seq_cst(default): Strongest. All threads see operations in the same order. Easiest to reason about.memory_order_relaxed: No ordering guarantees between different atomics. Fastest, but only safe for independent counters.memory_order_acquire/memory_order_release: Synchronize a pair — commonly used for producer-consumer patterns (release in producer, acquire in consumer).
Rule: Use seq_cst (the default) unless profiling shows it’s a bottleneck and you deeply understand the memory model.
16. Thread Pools and Task-Based Parallelism
Creating and destroying threads is expensive (~10–100 µs per thread creation). If you need to run thousands of short tasks, spawning a thread per task wastes time on overhead.
A thread pool creates N threads once and assigns tasks from a queue:
// Simplified thread pool (concept)
class ThreadPool {
std::vector<std::thread> workers_;
std::queue<std::function<void()>> tasks_;
std::mutex mtx_;
std::condition_variable cv_;
bool stop_ = false;
public:
ThreadPool(size_t n) {
for (size_t i = 0; i < n; ++i) {
workers_.emplace_back([this] {
while (true) {
std::function<void()> task;
{
std::unique_lock<std::mutex> lock(mtx_);
cv_.wait(lock, [this]{ return stop_ || !tasks_.empty(); });
if (stop_ && tasks_.empty()) return;
task = std::move(tasks_.front());
tasks_.pop();
}
task(); // execute outside the lock
}
});
}
}
void enqueue(std::function<void()> task) {
{
std::lock_guard<std::mutex> lock(mtx_);
tasks_.push(std::move(task));
}
cv_.notify_one();
}
~ThreadPool() {
{ std::lock_guard<std::mutex> lock(mtx_); stop_ = true; }
cv_.notify_all();
for (auto& w : workers_) w.join();
}
};
Usage: pool.enqueue([]{ process_image(img); }); — the task runs on the next available thread. No thread creation overhead.
17. Exception Handling
double divide(double a, double b) {
if (b == 0.0) throw std::invalid_argument("Division by zero");
return a / b;
}
try {
double result = divide(10, 0);
} catch (const std::invalid_argument& e) {
std::cerr << "Error: " << e.what() << "\n";
} catch (const std::exception& e) {
std::cerr << "Unknown std error: " << e.what() << "\n";
} catch (...) {
std::cerr << "Unknown error\n";
}
Stack unwinding: When an exception is thrown, the runtime walks up the call stack, destroying local objects (calling destructors) in each frame until a matching catch is found. This is why RAII works with exceptions — all resources held by stack objects are cleaned up automatically.
Guidelines:
- Throw by value, catch by
constreference. - Only throw for exceptional conditions (out-of-memory, file not found, invalid input). Don’t use exceptions for normal control flow (they are 10–100× slower than a return code on the error path).
- Mark functions that never throw
noexcept. This enables optimizations (move constructors should benoexceptsovectorcan use them safely during reallocation).
18. Memory Layout, Alignment, and Cache Awareness
18.1 Struct Layout and Padding
struct Bad {
char a; // 1 byte + 3 bytes padding (to align next int)
int b; // 4 bytes
char c; // 1 byte + 3 bytes padding (to align struct to 4)
}; // sizeof(Bad) = 12
struct Good {
int b; // 4 bytes
char a; // 1 byte
char c; // 1 byte + 2 bytes padding
}; // sizeof(Good) = 8
Rule: Order struct members from largest to smallest alignment to minimize padding.
18.2 Cache-Friendly Programming
Modern CPUs are not memory-bound — they are cache-bound. A cache miss (accessing memory not in L1/L2 cache) costs ~100 cycles; a cache hit costs ~4 cycles. This means:
std::vector»std::listfor iteration. A vector’s elements are contiguous → prefetcher loads the next cache line automatically. A linked list’s nodes are scattered across the heap → every access is a cache miss.- Array of Structs (AoS) vs. Struct of Arrays (SoA): If you only access one field of a struct in a hot loop, SoA keeps all values of that field contiguous → better cache utilization.
- False sharing: Two threads writing to different variables that happen to share a cache line (64 bytes on x86) cause the cache line to ping-pong between cores. Fix: pad variables to cache line boundary (
alignas(64) int counter;).
19. Modern C++ Best Practices (Summary)
- Use RAII for all resources. No raw
new/delete, no rawfopen/fclose, no rawlock/unlock. - Prefer
constandconstexpr. Make things immutable by default. - Use smart pointers for heap ownership:
unique_ptr(default),shared_ptr(when shared). - Default to
std::vector. Switch to other containers only when justified by measurement. - Prefer algorithms over raw loops.
std::sort,std::transform,std::find_if. - Use
autofor iterator types and complex template return types. Don’t overuse —auto x = 42;is fine, butauto x = compute();can obscure the type. - Pass by
const&for read-only parameters larger than a pointer. Pass by value if you need a copy anyway (the caller canstd::moveinto it). - Mark functions
noexceptwhen they won’t throw (especially move operations and destructors). - Use
std::string_view(C++17) for non-owning string references — avoids allocations. - Write thread-safe code by minimizing shared state, not by adding more locks. The safest synchronization is no synchronization (isolated data, message passing).
- Profile before optimizing. Use
perf,valgrind,gperftools, orTracyto find actual bottlenecks — they are almost never where you think they are.