Designing Data Intensive Applications

Last updated on November 6, 2024

This article will take 2 minutes to read.

Designing Data-Intensive Applications

Part of my series of notes on papers. This is a book, however, so I will only consider the part that is available online, part 1. Will consider obtaining the rest at the end.

Tags: diseño y arquitectura de software [Link to article](https://link

Designing Data-Intensive Applications
- Introduction
  - Thinking about Data Systems

Introduction

The Internet was done so well that most people think of it as a natural resource like the Pacific Ocean, rather than something that was man-made. When was the last time a technology with a scale like that was so error-free?

Alan Kay, in interview with Dr Dobb’s Journal (2012)

Basic building blocks:

Databases
- Store data to find it later
Caches
- Remember the result of an expensive operation.
Indexes
- Allow users to search by keyword or filter
Stream Processing
- Send message to other process to be handled asynchronously.
Batch Processing
- Periodically crunch large amount of accumulated Data

Useful abstractions for design, different trade-offs depending on implementation.

Thinking about Data Systems

Why lump all abstractions like message queues and DBs in same category? Distinctions have become blurred.

You can make complex/composite data systems from smaller components.

How do you ensure data remains correct and complete, even with internal errors?
How do you provide consistently good performance, even when parts degrade?
How do you scale?
What’s a good API?

Main ideas behind Data-intensive Applications:

Reliability
- System should work correctly in face of adversity.
- Tolerate user mistakes or unexpected use.
- Prevents unauthorized access and abuse.
- “continuing to work correctly even when things go wrong”
- Fault is not failure
  - Fault:
    - One component deviating from spec.
  - Failure:
    - Whole system stops providing required service.
- It’s possible to increase faults to reduce failure
  - Netflix Chaos Monkey
Scalability
- Should easily grow.
Maintainability
- Many people should be able to work productively.

Notes mentioning this note

Designing Data Intensive Applications

Designing Data-Intensive Applications

Notes On Papers

Notes on Papers

Here are all the notes in this garden, along with their links, visualized as a graph.