Designing Data Intensive Applications
This article will take 2 minutes to read.
Designing Data-Intensive Applications
Part of my series of notes on papers. This is a book, however, so I will only consider the part that is available online, part 1. Will consider obtaining the rest at the end.
Tags: diseño y arquitectura de software [Link to article](https://link
Introduction
The Internet was done so well that most people think of it as a natural resource like the Pacific Ocean, rather than something that was man-made. When was the last time a technology with a scale like that was so error-free?
Alan Kay, in interview with Dr Dobb’s Journal (2012)
Basic building blocks:
- Databases
- Store data to find it later
- Caches
- Remember the result of an expensive operation.
- Indexes
- Allow users to search by keyword or filter
- Stream Processing
- Send message to other process to be handled asynchronously.
- Batch Processing
- Periodically crunch large amount of accumulated Data
Useful abstractions for design, different trade-offs depending on implementation.
Thinking about Data Systems
Why lump all abstractions like message queues and DBs in same category? Distinctions have become blurred.
You can make complex/composite data systems from smaller components.
- How do you ensure data remains correct and complete, even with internal errors?
- How do you provide consistently good performance, even when parts degrade?
- How do you scale?
- What’s a good API?
Main ideas behind Data-intensive Applications:
- Reliability
- System should work correctly in face of adversity.
- Tolerate user mistakes or unexpected use.
- Prevents unauthorized access and abuse.
- “continuing to work correctly even when things go wrong”
- Fault is not failure
- Fault:
- One component deviating from spec.
- Failure:
- Whole system stops providing required service.
- Fault:
- It’s possible to increase faults to reduce failure
- Scalability
- Should easily grow.
- Maintainability
- Many people should be able to work productively.