Skip to content

Create distributed_systems.md #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 1, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 79 additions & 0 deletions Paper-Shelf/distributed_systems.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
Here’s a curated list of **seminal papers** in distributed systems that are frequently referenced in interviews and system design discussions. These papers cover foundational theories, real-world systems, and cutting-edge innovations:

---

### **Foundational Theory & Concepts**
1. **[Time, Clocks, and the Ordering of Events in a Distributed System (Lamport, 1978)](https://lamport.azurewebsites.net/pubs/time-clocks.pdf)**
- Introduces **Lamport clocks** and the concept of causality in distributed systems. A must-read for understanding event ordering.

2. **[Impossibility of Distributed Consensus with One Faulty Process (FLP Impossibility, 1985)](https://groups.csail.mit.edu/tds/papers/Lynch/jacm85.pdf)**
- Proves that consensus is impossible in asynchronous systems with even one faulty process. Critical for understanding trade-offs in consensus algorithms.

3. **[CAP Theorem (Brewer, 2000)](https://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed/)**
- The original CAP theorem paper (with a follow-up 12 years later). Explains the trade-offs between consistency, availability, and partition tolerance.

4. **[PACELC: Revisiting the CAP Theorem (Abadi, 2012)](https://cs-www.cs.yale.edu/homes/dna/papers/abadi-pacelc.pdf)**
- Extends CAP to include **latency** as a key trade-off in partition-free scenarios.

---

### **Consensus & Coordination**
5. **[The Part-Time Parliament (Paxos, Lamport, 1998)](https://lamport.azurewebsites.net/pubs/lamport-paxos.pdf)**
- The original Paxos paper. Required reading for understanding consensus in fault-tolerant systems.

6. **[In Search of an Understandable Consensus Algorithm (Raft, 2014)](https://raft.github.io/raft.pdf)**
- Introduces **Raft**, a simpler alternative to Paxos. Widely used in systems like etcd and Kubernetes.

7. **[Viewstamped Replication (VR, 1988)](https://pmg.csail.mit.edu/papers/vr-revisited.pdf)**
- A precursor to Raft and Paxos. Explains replication and consensus for state machine replication.

---

### **Distributed Storage Systems**
8. **[Dynamo: Amazon’s Highly Available Key-Value Store (2007)](https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf)**
- Introduces eventual consistency, vector clocks, and decentralized architectures. Inspired Cassandra, Riak, and more.

9. **[Bigtable: A Distributed Storage System for Structured Data (2006)](https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf)**
- Google’s paper on Bigtable, a foundational system for wide-column NoSQL databases (e.g., HBase, Cassandra).

10. **[Spanner: Google’s Globally Distributed Database (2012)](https://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf)**
- Introduces **TrueTime** and a globally consistent database. Critical for understanding distributed transactions.

11. **[The Google File System (2003)](https://static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf)**
- Inspired Hadoop HDFS. Explains distributed file systems for large-scale data processing.

---

### **Distributed Computing Models**
12. **[MapReduce: Simplified Data Processing on Large Clusters (2004)](https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf)**
- Google’s paper on MapReduce, the backbone of Hadoop and batch processing systems.

13. **[Resilient Distributed Datasets (Spark, 2012)](https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf)**
- Introduces Apache Spark’s in-memory computing model. Key for modern data processing.

---

### **Other Notable Papers**
14. **[Kafka: A Distributed Messaging System (2011)](https://www.kai-waehner.de/blog/2021/04/20/apache-kafka-10-years-later-linkedin-original-paper-kafka-connect-turbine/)**: LinkedIn’s original design for Kafka.
15. **[The Chubby Lock Service (2006)](https://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf):** Google’s lock service for loosely coupled systems.
16. **[Cassandra: Decentralized Structured Storage System (2009)](https://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf)**
17. **[Bitcoin: A Peer-to-Peer Electronic Cash System (Nakamoto, 2008)](https://bitcoin.org/bitcoin.pdf)**
- Introduces blockchain and proof-of-work consensus.

---

### **Where to Find More Papers**
- **[MIT’s Distributed Systems Reading List](https://pdos.csail.mit.edu/6.824/schedule.html)**
- **[University of Cambridge’s Systems Reading Group](https://www.cst.cam.ac.uk/systems-seminars)**
- **[Papers We Love (Distributed Systems)](https://github.com/papers-we-love/papers-we-love/tree/master/distributed_systems)**

---

### **Bonus Resources**
- **Books**:
- [*Designing Data-Intensive Applications* by Martin Kleppmann](https://dataintensive.net/) (cites many of these papers).
- **Courses**:
- [MIT 6.824: Distributed Systems](https://pdos.csail.mit.edu/6.824/) (uses papers like Raft, MapReduce, and Spanner in labs).

These papers will help you internalize the principles behind systems like DynamoDB, Kubernetes, Kafka, and more. For interviews, focus on understanding the **trade-offs** and **design motivations** in these papers.

Loading