Summary:

Enterprises use distributed database systems to meet the demands of mixed or hybrid transaction/analytical processing (HTAP) workloads that contain both transactional (OLTP) and analytical (OLAP) requests. Distributed HTAP systems typically maintain a complete copy of data in row-oriented storage format that is well-suited for OLTP workloads and a second complete copy in column-oriented storage format optimised for OLAP workloads. Maintaining these data copies consumes significant storage space and system resources. Conversely, if a system stores data in a single format, OLTP or OLAP workload performance suffers.

In this interview, Michael talks about Proteus, a distributed HTAP database system that adaptively and autonomously selects and changes its storage layout to optimize for mixed workloads. Proteus generates physical execution plans that utilize storage-aware operators for efficient transaction execution. For HTAP workloads, Proteus delivers superior performance while providing OLTP and OLAP performance on par with designs specialized for either type of workload.

Questions:

0:56: Can you start off by explaining what a mixed workload is?

1:58: What is the challenge database systems face in trying to support these mixed workloads?

3:23: How have previous database systems tried to support mixed workloads?

5:19: What are the design goals of Proteus?

7:23: Can you elaborate more on the architecture of Proteus and how it makes decisions?

8:46: Can you dig into how you predict the transaction latency, what is the mechanism behind this?

10:35: It feels to me that you are accumulating a lot of metadata, this must have some overhead, how does this impact performance?

12:08: It sounds like the Adaptive Storage Advisor is a centralized coordinator, what are the limitations of this decision choice?

13:35: Are we in the context of a data-center here or can Proteus handle a geo-distributed deployment?

14:34: Changing the storage layout has some implicit cost, how does Proteus decide whether a storage layout change is good or bad?

16:57: How does Proteus predict what the transaction is going to be?

18:46: How did you evaluate Proteus?

20:20: If you had to summarize your work, what is the one key insight the listener can take away?

21:07: Is Proteus publicly available?

21:39: What are the next steps?

22:57: What is the most unexpected lesson you have learned whilst working on distributed database systems?

24:21: Do you think a single system catering for both workload types is better than two specialized engines?

26:10: What attracted you to work on this topic?

Contact:

Website: https://cs.uwaterloo.ca/~mtabebe/
Email: mtabebe@uwaterloo.ca
GitHub: @mtabebe

Hosted on Acast. See acast.com/privacy for more information.

Michael Abebe | Proteus: Autonomous Adaptive Storage for Mixed Workloads | #7

Summary:

Questions:

Links:

Contact:

Michael Abebe | Proteus: Autonomous Adaptive Storage for Mixed Workloads | #7

Summary:

Questions:

Links:

Contact:

Related Episodes

Pat Helland | Scalable OLTP in the Cloud: What’s the BIG DEAL? | #50

Rui Liu | Towards Resource-adaptive Query Execution in Cloud Native Databases | #49

Yifei Yang | Predicate Transfer: Efficient Pre-Filtering on Multi-Join Queries | #48