Databases & Storage: The Engineering Decisions That Define System Reliability

For CTOs and architects optimizing performance, durability, and cost in data systems.

“Should we use Postgres or Mongo?” sounds like a tooling debate. In reality, it’s a systems design decision that shapes correctness, latency, operability, and incident response for years.

This article focuses on five fundamentals that sit under every serious data architecture: SQL vs NoSQL, ACID vs BASE, indexing internals (B-Trees vs LSM), Write-Ahead Logs (WAL), and MVCC. Get these right early, and you avoid rewrites, outages, and expensive constraints later.

5 Decisions That Predict Reliability

SQL vs NoSQL: choose by invariants + query patterns, not trend.
ACID vs BASE: apply consistency per workflow (payments ≠ feeds).
B-Tree vs LSM: B-Trees win for OLTP + range scans; LSM wins for write-heavy ingest (with compaction costs).
WAL: defines where durability actually begins; misconfigurations cause “confirmed” writes to vanish.
MVCC: improves concurrency, but long transactions cause bloat, vacuum pressure, and latency drift.

1. SQL vs. NoSQL: Beyond the Buzzwords

SQL databases enforce relationships, constraints, and ACID transactions. They’re built for correctness and cross-entity invariants—payments, inventory, regulated records.

NoSQL is a category, not a single model:

Document (MongoDB): flexible aggregates
Key-value (Redis/DynamoDB): high-speed lookups
Wide-column (Cassandra): heavy write streams
Graph (Neo4j): relationship traversal

Rule of thumb:

If your system requires “all-or-nothing” across entities → SQL
If your dominant workload is aggregate reads at scale → NoSQL may fit

Most mature systems are polyglot: SQL for transactions, NoSQL for read models and caching.

Wrong choices don’t fail immediately. They fail at scale—when invariants leak into application code and race conditions appear.

2. ACID vs. BASE: The Consistency Philosophies

ACID guarantees atomicity, consistency, isolation, and durability. It protects invariants but costs coordination—and sometimes availability.

BASE favors availability and eventual consistency. It scales well but tolerates temporary anomalies.

The mistake is choosing one globally:

ACID: payments, auth, inventory
BASE: feeds, catalogs, analytics

The right question isn’t “which is better?” It’s: Where can the business tolerate staleness—and where can it not?

3. Indexing: How B-Trees and LSM Trees Work

Indexes define real-world latency.

B-Trees

Optimized for OLTP + range queries
Predictable reads
Slower sustained write throughput under heavy churn

LSM Trees

Optimized for write-heavy workloads
High ingestion performance
Compaction and read amplification can hurt p95/p99

Decision rule:

Mixed reads + range scans → B-Tree
High-volume ingest/logs/time-series → LSM

Index structure isn’t internal plumbing—it directly shapes latency and operational cost.

4. Understanding Write-Ahead Logs (WAL)

Every serious database writes to a Write-Ahead Log (WAL) first:

Append to log
Flush (fsync)
Apply to data
Replay on crash

WAL ensures durability and atomicity, and it powers replication and CDC.

The danger is silent configuration changes—disabling synchronous commits or misunderstanding fsync behavior can turn “successful” writes into data loss during crashes.

Discipline:

Know where your first durable write happens
Instrument it
Chaos-test recovery paths

Durability is not a checkbox—it’s an engineering decision.

5. MVCC: Multi-Version Concurrency Control Demystified

MVCC (Multi-Version Concurrency Control) lets readers and writers proceed simultaneously by keeping multiple row versions.

Benefits:

High concurrency
Snapshot isolation
Fewer lock bottlenecks

Costs:

Long transactions → table bloat
Vacuum pressure
Gradual latency degradation

Operational rules:

Keep transactions short
Monitor vacuum lag and bloat
Push heavy reads to replicas
Choose isolation levels deliberately

MVCC works exactly as designed—until your workload ignores its constraints.

Quick Answers: Databases & Reliability

1. What is WAL (Write-Ahead Logging)?

WAL is an append-only log where the database records changes before applying them to tables/indexes. If the system crashes, it replays the WAL to recover committed data.

2. B-Tree vs LSM Tree — which should I use?

B-Trees are best for OLTP and range queries with predictable reads. LSM trees are best for write-heavy ingest at scale, but compaction can hurt latency if your workload becomes read/range-scan heavy.

3. What is MVCC in databases?

MVCC (Multi-Version Concurrency Control) allows reads and writes to happen concurrently by keeping multiple versions of rows. It improves throughput but can cause bloat if transactions run long.

4. ACID vs BASE — what’s the difference?

ACID gives strong correctness guarantees (atomic, isolated, durable transactions). BASE favors availability and scale with eventual consistency—faster, but you must handle anomalies.

5. SQL vs NoSQL — how do I choose?

Choose based on invariants and access patterns. If you need cross-entity transactional correctness, start with SQL. Use NoSQL where flexibility, scale, or simple aggregate reads dominate.

< previous

How to Design Distributed Systems: Consistency, Availability, Sharding & Replication

Next >

Enterprise UX in Digital Transformation: Lower Risk and Increase Adoption

Next >

We use cookies to enhance the experience on our website. To know more please read our Privacy Policy