Thomas Hütter | JEDI: These aren’t the JSON documents you’re looking for | #4 | Disseminate - bringing you the best Computer Science research.

Summary:

The JavaScript Object Notation (JSON) is a popular data format used in document stores to natively support semi-structured data.

In this interview, Thomas talks about how he addressed the problem of JSON similarity lookup queries: given a query document and a distance threshold, retrieve all documents that are within the threshold from the query document, i.e., get me all similar documents!. Different from other hierarchical formats such as XML, JSON supports both ordered and unordered sibling collections within a single document which poses a new challenge to the tree model and distance computation. Thomas talks about his proposal JSON tree, a lossless tree representation of JSON documents, and define the JSON Edit Distance (JEDI), the first edit-based distance measure for JSON. He talks about the development of QuickJEDI, an algorithm that computes JEDI by leveraging a new technique to prune expensive sibling matchings. It outperforms a baseline algorithm by an order of magnitude in runtime. Our experimental evaluation shows that our solution scales to databases with millions of documents and JSON trees with tens of thousands of nodes.

Questions:

0:47: Can you explain to the listeners what is JSON?

1:14: What is the problem you're trying to solve in your research?

1:48: What was the reason JSON was under researched?

2:13: What is the motivation for this research? Why do we need it?

2:52: What was the solution you developed to solve this problem?

4:35: How does tree edit distance work?

5:18: How do we go from tree edit distance to JEDI?

6:29: How did you evaluate JEDI?

8:31: Do other database systems provide similar functionality?

9:33: Can you tell the listeners more about AsterixDB?

10:20: What was the most challenge aspect of working on this topic?

10:59: What are the future plans for this research?

11:56: What attracted you to working on similarity queries?

Links:

Hosted on Acast. See acast.com/privacy for more information.

Thomas Hütter | JEDI: These aren’t the JSON documents you’re looking for | #4

Summary:

Questions:

Links:

Related Episodes

Pat Helland | Scalable OLTP in the Cloud: What’s the BIG DEAL? | #50

Rui Liu | Towards Resource-adaptive Query Execution in Cloud Native Databases | #49

Yifei Yang | Predicate Transfer: Efficient Pre-Filtering on Multi-Join Queries | #48