Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for improve TinySQL as a learning-friendly mini distributed relational database #83

Open
1 of 15 tasks
rebelice opened this issue Aug 9, 2021 · 2 comments

Comments

@rebelice
Copy link
Collaborator

rebelice commented Aug 9, 2021

It is corresponding to the effort towards Talent Plan v3.0.

According to user feedback and my investigation, I found that TinySQL has serious issues. They make it a departure from the learning-friendly mini distributed relational database:

  1. Not mini. The TinySQL has more than 100,000 lines of code. It is almost a copy of TiDB, and then part of the code is deleted. It contains a lot of irrelevant code and design.
  2. Documents unfriendly. It almost only briefly explained the relevant knowledge topics and did not explain the project structure.
  3. Poor course design. The topics explained in each lab are very large, but the content that needs to be implemented is only a small part.
  4. Poor comments. They can't help understand the code.

In order to solve the above problems, I will redesign and implement TinySQL. The main improvements in the plan are as follows:

  • Redesign the course.
    • Divide TinySQL into five stages. Each stage has a clear target and iconic function, and the subsequent stages are based on the previous stage, which is the progression of the previous stage.
    • At one stage, we hope that TinySQL is simple enough. As more stages are completed, we will add necessary functions to TinySQL to make it truly a distributed relational database.
    • I put the specific stage division at the end of this issue.
  • Adopt incremental framework mode.
    • Initially, the course framework has no content, and every stage/substage will introduce the framework code that must be required for that stage/substage. Its purpose is to ensure the conciseness of the framework code and clearly show the content introduced at each stage/substage.
  • Optimize documentation and comments
    • The documentation layout
    ## Stage
    
    ### Introduction
    #### Objectives
    #### Materials
    
    ### Topic 1
    #### Knowledge topic
    #### Related code
    
    ### Exercises
    ### References
    

The following is stage design:

  • Stage 1: read-only relational database
    • Target: the ability to read data using KV engine API
    • Iconic function: the ability to handle simple SELECT statements
    • Knowledge topic:
      • parser
      • data mapping from the relational model to KV
      • generating operator
  • Stage 2: insert and update
    • Target: the ability to write data using KV engine API
    • Iconic function: the ability to handle simple INSERT/UPDATE statements
    • Knowledge topic:
      • volcano model
  • Stage 3: DDL
    • Target: the ability to process DDL online
    • Iconic function: the ability to process CREATE/DROP TABLE/INDEX online
    • Knowledge topic:
      • online DDL algorithm
  • Stage 4: Optimizer
    • Target: implement an optimizer and be able to choose the appropriate index and Join Order
    • Iconic function:
      • ability to collect statistics
      • ability to choose the appropriate index and Join Order
    • Knowledge topic:
      • SQL optimization
      • statistics
      • SystemR optimizer
  • Stage 5: Calculation optimization
    • Target: optimize the calculation framework to improve performance
    • Iconic function:
      • vectorization
      • Massively Parallel Processing(MPP)
    • Knowledge topic:
      • vectorization
      • MPP

Issues

@yanguwan
Copy link
Contributor

Thanks Rebelice. The topic list is fine as next wave of Talent Plan.

@feitian124
Copy link

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants