Summary:
Enterprises use distributed database systems to meet the demands of mixed or hybrid transaction/analytical processing (HTAP) workloads that contain both transactional (OLTP) and analytical (OLAP) requests. Distributed HTAP systems typically maintain a complete copy of data in row-oriented storage format that is well-suited for OLTP workloads and a second complete copy in column-oriented storage format optimised for OLAP workloads. Maintaining these data copies consumes significant storage space and system resources. Conversely, if a system stores data in a single format, OLTP or OLAP workload performance suffers.
In this interview, Michael talks about Proteus, a distributed HTAP database system that adaptively and autonomously selects and changes its storage layout to optimize for mixed workloads. Proteus generates physical execution plans that utilize storage-aware operators for efficient transaction execution. For HTAP workloads, Proteus delivers superior performance while providing OLTP and OLAP performance on par with designs specialized for either type of workload.
Questions:
0:56: Can you start off by explaining what a mixed workload is?
1:58: What is the challenge database systems face in trying to support these mixed workloads?
3:23: How have previous database systems tried to support mixed workloads?
5:19: What are the design goals of Proteus?
7:23: Can you elaborate more on the architecture of Proteus and how it makes decisions?
8:46: Can you dig into how you predict the transaction latency, what is the mechanism behind this?
10:35: It feels to me that you are accumulating a lot of metadata, this must have some overhead, how does this impact performance?
12:08: It sounds like the Adaptive Storage Advisor is a centralized coordinator, what are the limitations of this decision choice?
13:35: Are we in the context of a data-center here or can Proteus handle a geo-distributed deployment?
14:34: Changing the storage layout has some implicit cost, how does Proteus decide whether a storage layout change is good or bad?
16:57: How does Proteus predict what the transaction is going to be?
18:46: How did you evaluate Proteus?
20:20: If you had to summarize your work, what is the one key insight the listener can take away?
21:07: Is Proteus publicly available?
21:39: What are the next steps?
22:57: What is the most unexpected lesson you have learned whilst working on distributed database systems?
24:21: Do you think a single system catering for both workload types is better than two specialized engines?
26:10: What attracted you to work on this topic?
Links:
- Paper: https://cs.uwaterloo.ca/~mtabebe/publications/abebeProteus2022SIGMOD.pdf
- Presentation: https://www.youtube.com/watch?v=qbe29viYTas
- Uni of Waterloo Data Systems Group: https://uwaterloo.ca/data-systems-group/
Contact:
- Website: https://cs.uwaterloo.ca/~mtabebe/
- Email: mtabebe@uwaterloo.ca
- GitHub: @mtabebe
Hosted on Acast. See acast.com/privacy for more information.