Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python clustering Module Review #908

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Python clustering Module Review #908

wants to merge 9 commits into from

Conversation

dsbuddy
Copy link
Collaborator

@dsbuddy dsbuddy commented Mar 21, 2024

No description provided.

@dsbuddy dsbuddy requested a review from rosemm March 21, 2024 15:17
@drelliche
Copy link
Contributor

drelliche commented Jun 12, 2024

This is failing the "Check modules in PR for version incrementation" check because the version number isn't 1.0.0. I suspect that the other modules are failing this check for the same reason.

The "Details" link to the right of the failed check brings you to the error message.

@drelliche
Copy link
Contributor

Structural Comments:

Right now the "Python Implementation of K-Means Clustering" is broken into broken into 8 steps, but all are on the same page. I suggest breaking this page apart so that each step has its own page. Pyodide cells will remember what was done on previous pages as long as people navigate through the module using the internal module buttons (see the pandas module for a working example.)

For each step, you already have a brief description of what the code is doing, and the code block. It would be helpful to learners to have for each step:

  • description of what the code is doing (already there)
  • reminder of why the code should be doing it (e.g. why is it important to normalize the data in step 4?)
  • Well-commented code (already there)
  • Some output so that learners can get a feel for what the code is doing. This might be as simple as adding print(results) in steps 6 and 7, or print(normalized_data) in step 4.

Part of the tricky balance is helping learners understand enough to use the clustering tools correctly, without making them reinvent the wheel.

It might also be helpful to have a "full code" page at the end with all of the code in a single cell block, so that people who want to can copy and paste it all at once. This isn't a great pedagogical tool but will make the module a more useful reference for people to return to.

The "Conclusion" page is a good resource. Changing the title to "Key takeaways" and maybe breaking it into two pages: "Key Takeaways" and "Beyond K-Means Clustering" would be helpful.

Quizzes and Learning Objectives

Since this doesn't seem to be an exercise module, i.e. there is no "go do this on your own" component, the module standards require there to be some quiz questions, ideally one for each learning objective. I don't know the the learning objectives shifted a little with the creation of an "Intro to clustering" module, but that would be a good thing to review if you have significantly changed the module since the learning objectives were originally crafted.

For example the objective "Learn how to implement the K-Means clustering algorithm in Python" could have a corresponding quiz question where partial code is provided and the learners have to fill in the blanks (see an example)

There is some great content in here, I look forward to getting it in the hands of our learners!

@drelliche
Copy link
Contributor

In reading through how you split the two clustering modules, I can see that a lot of details were moved into the intro to clustering module. It definitely makes sense to have them separate, and there might even end up being a small feeling of repetition between the "why" portions of both of them. That repetition would be good for people who only do one of the modules, and probably even better and learning reinforcement for people who do them both. I'll put more comments on the other modules soon.

@drelliche drelliche mentioned this pull request Jun 21, 2024
@drelliche
Copy link
Contributor

@dsbuddy Where did the data in heart.csv and polyps.csv originate from? Could you give me the sources so we can cite them? Alternatively if this is data that you fabricated for the examples, we should state that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants