diff --git a/Paper-Shelf/distributed_systems.md b/Paper-Shelf/distributed_systems.md new file mode 100644 index 0000000..e1f06e9 --- /dev/null +++ b/Paper-Shelf/distributed_systems.md @@ -0,0 +1,79 @@ +Here’s a curated list of **seminal papers** in distributed systems that are frequently referenced in interviews and system design discussions. These papers cover foundational theories, real-world systems, and cutting-edge innovations: + +--- + +### **Foundational Theory & Concepts** +1. **[Time, Clocks, and the Ordering of Events in a Distributed System (Lamport, 1978)](https://lamport.azurewebsites.net/pubs/time-clocks.pdf)** + - Introduces **Lamport clocks** and the concept of causality in distributed systems. A must-read for understanding event ordering. + +2. **[Impossibility of Distributed Consensus with One Faulty Process (FLP Impossibility, 1985)](https://groups.csail.mit.edu/tds/papers/Lynch/jacm85.pdf)** + - Proves that consensus is impossible in asynchronous systems with even one faulty process. Critical for understanding trade-offs in consensus algorithms. + +3. **[CAP Theorem (Brewer, 2000)](https://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed/)** + - The original CAP theorem paper (with a follow-up 12 years later). Explains the trade-offs between consistency, availability, and partition tolerance. + +4. **[PACELC: Revisiting the CAP Theorem (Abadi, 2012)](https://cs-www.cs.yale.edu/homes/dna/papers/abadi-pacelc.pdf)** + - Extends CAP to include **latency** as a key trade-off in partition-free scenarios. + +--- + +### **Consensus & Coordination** +5. **[The Part-Time Parliament (Paxos, Lamport, 1998)](https://lamport.azurewebsites.net/pubs/lamport-paxos.pdf)** + - The original Paxos paper. Required reading for understanding consensus in fault-tolerant systems. + +6. **[In Search of an Understandable Consensus Algorithm (Raft, 2014)](https://raft.github.io/raft.pdf)** + - Introduces **Raft**, a simpler alternative to Paxos. Widely used in systems like etcd and Kubernetes. + +7. **[Viewstamped Replication (VR, 1988)](https://pmg.csail.mit.edu/papers/vr-revisited.pdf)** + - A precursor to Raft and Paxos. Explains replication and consensus for state machine replication. + +--- + +### **Distributed Storage Systems** +8. **[Dynamo: Amazon’s Highly Available Key-Value Store (2007)](https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf)** + - Introduces eventual consistency, vector clocks, and decentralized architectures. Inspired Cassandra, Riak, and more. + +9. **[Bigtable: A Distributed Storage System for Structured Data (2006)](https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf)** + - Google’s paper on Bigtable, a foundational system for wide-column NoSQL databases (e.g., HBase, Cassandra). + +10. **[Spanner: Google’s Globally Distributed Database (2012)](https://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf)** + - Introduces **TrueTime** and a globally consistent database. Critical for understanding distributed transactions. + +11. **[The Google File System (2003)](https://static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf)** + - Inspired Hadoop HDFS. Explains distributed file systems for large-scale data processing. + +--- + +### **Distributed Computing Models** +12. **[MapReduce: Simplified Data Processing on Large Clusters (2004)](https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf)** + - Google’s paper on MapReduce, the backbone of Hadoop and batch processing systems. + +13. **[Resilient Distributed Datasets (Spark, 2012)](https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf)** + - Introduces Apache Spark’s in-memory computing model. Key for modern data processing. + +--- + +### **Other Notable Papers** +14. **[Kafka: A Distributed Messaging System (2011)](https://www.kai-waehner.de/blog/2021/04/20/apache-kafka-10-years-later-linkedin-original-paper-kafka-connect-turbine/)**: LinkedIn’s original design for Kafka. +15. **[The Chubby Lock Service (2006)](https://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf):** Google’s lock service for loosely coupled systems. +16. **[Cassandra: Decentralized Structured Storage System (2009)](https://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf)** +17. **[Bitcoin: A Peer-to-Peer Electronic Cash System (Nakamoto, 2008)](https://bitcoin.org/bitcoin.pdf)** + - Introduces blockchain and proof-of-work consensus. + +--- + +### **Where to Find More Papers** +- **[MIT’s Distributed Systems Reading List](https://pdos.csail.mit.edu/6.824/schedule.html)** +- **[University of Cambridge’s Systems Reading Group](https://www.cst.cam.ac.uk/systems-seminars)** +- **[Papers We Love (Distributed Systems)](https://github.com/papers-we-love/papers-we-love/tree/master/distributed_systems)** + +--- + +### **Bonus Resources** +- **Books**: + - [*Designing Data-Intensive Applications* by Martin Kleppmann](https://dataintensive.net/) (cites many of these papers). +- **Courses**: + - [MIT 6.824: Distributed Systems](https://pdos.csail.mit.edu/6.824/) (uses papers like Raft, MapReduce, and Spanner in labs). + +These papers will help you internalize the principles behind systems like DynamoDB, Kubernetes, Kafka, and more. For interviews, focus on understanding the **trade-offs** and **design motivations** in these papers. +