Michael Abebe | Proteus: Autonomous Adaptive Storage for Mixed Workloads | #7
DisseminateJuly 18, 2022x
7
27:5725.59 MB

Michael Abebe | Proteus: Autonomous Adaptive Storage for Mixed Workloads | #7

Summary:

Enterprises use distributed database systems to meet the demands of mixed or hybrid transaction/analytical processing (HTAP) workloads that contain both transactional (OLTP) and analytical (OLAP) requests. Distributed HTAP systems typically maintain a complete copy of data in row-oriented storage format that is well-suited for OLTP workloads and a second complete copy in column-oriented storage format optimised for OLAP workloads. Maintaining these data copies consumes significant storage space and system resources. Conversely, if a system stores data in a single format, OLTP or OLAP workload performance suffers.


In this interview, Michael talks about Proteus, a distributed HTAP database system that adaptively and autonomously selects and changes its storage layout to optimize for mixed workloads. Proteus generates physical execution plans that utilize storage-aware operators for efficient transaction execution. For HTAP workloads, Proteus delivers superior performance while providing OLTP and OLAP performance on par with designs specialized for either type of workload.


Questions:

0:56: Can you start off by explaining what a mixed workload is? 

1:58: What is the challenge database systems face in trying to support these mixed workloads? 

3:23: How have previous database systems tried to support mixed workloads? 

5:19: What are the design goals of Proteus? 

7:23: Can you elaborate more on the architecture of Proteus and how it makes decisions? 

8:46: Can you dig into how you predict the transaction latency, what is the mechanism behind this? 

10:35: It feels to me that you are accumulating a lot of metadata, this must have some overhead, how does this impact performance? 

12:08: It sounds like the Adaptive Storage Advisor is a centralized coordinator, what are the limitations of this decision choice?  

13:35: Are we in the context of a data-center here or can Proteus handle a geo-distributed deployment? 

14:34: Changing the storage layout has some implicit cost, how does Proteus decide whether a storage layout change is good or bad? 

16:57: How does Proteus predict what the transaction is going to be?

18:46: How did you evaluate Proteus?

20:20: If you had to summarize your work, what is the one key insight the listener can take away?

21:07: Is Proteus publicly available? 

21:39: What are the next steps? 

22:57: What is the most unexpected lesson you have learned whilst working on distributed database systems? 

24:21: Do you think a single system catering for both workload types is better than two specialized engines? 

26:10: What attracted you to work on this topic?


Links:


Contact:




Hosted on Acast. See acast.com/privacy for more information.