Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix competency extraction feature #145

Merged
merged 14 commits into from
Aug 27, 2024
Merged

Conversation

MichaelOwenDyer
Copy link
Contributor

@MichaelOwenDyer MichaelOwenDyer commented Aug 9, 2024

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Introduced a competency extraction pipeline to enhance the processing of course descriptions.
    • Added new API endpoints for running competency extraction asynchronously.
    • Implemented a structured AI prompt for extracting competencies from course descriptions.
    • Introduced a new class for managing competency extraction status updates.
    • Enhanced data validation capabilities with new competency data models.
    • Simplified state management in the status update system for competency extraction.
  • Bug Fixes

    • Streamlined data structures in various data transfer objects for improved clarity and performance.
  • Documentation

    • Enhanced docstring readability in the course chat pipeline without altering functionality.

Some manual notes:

  • I decided not to use LangChain to implement this feature, I just call the CapabilityRequestHandler directly
  • The original implementation of this feature called the LLM in a loop for each additional competency, which was horrible for performance. Now, I just call the LLM once and parse many competencies out of the returned string. In my testing, the pipeline takes only a second or two to complete now.
  • This PR contains only two changes to existing code: I de-duplicated the settings and initialStages attributes out of the subclasses of PipelineExecutionDTO into the superclass. This allows me to use PipelineExecutionDTO in my new CompetencyExtractionPipelineExecutionDTO (I opted for composition, not inheritance)

Copy link
Contributor

coderabbitai bot commented Aug 9, 2024

Walkthrough

The changes significantly enhance the competency extraction capabilities of the application by introducing and refining data transfer objects (DTOs), implementing new pipeline classes, and adding API endpoints. The updates improve the architecture, ensure data integrity, and streamline the competency extraction process, facilitating more effective educational assessments and user interactions. Additionally, the status update mechanisms have been simplified, enhancing overall responsiveness.

Changes

Files and Folders Change Summary
app/domain/__init__.py, app/domain/chat/chat_pipeline_execution_dto.py, app/domain/ingestion/ingestion_pipeline_execution_dto.py, app/domain/pipeline_execution_dto.py Various attributes removed or added in DTOs to streamline competency extraction processes.
app/domain/competency_extraction_pipeline_execution_dto.py, app/domain/status/competency_extraction_status_update_dto.py, app/domain/data/competency_dto.py Introduced new DTOs (CompetencyExtractionPipelineExecutionDTO and CompetencyExtractionStatusUpdateDTO) and Competency class for managing extraction data and validation.
app/pipeline/__init__.py, app/pipeline/competency_extraction_pipeline.py, app/pipeline/prompts/competency_extraction.py Implemented new pipeline class and system prompts for competency extraction logic and AI model interaction.
app/web/routers/pipelines.py, app/web/status/status_update.py Added new API endpoints and classes (like CompetencyExtractionCallback) to enhance status management and execution of competency extraction.
app/llm/external/openai_chat.py Removed an import for ResponseFormat, indicating a shift in how response formats are handled in chat processing.
app/pipeline/chat/course_chat_pipeline.py Modified docstring of lecture_content_retrieval for improved readability.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Outside diff range, codebase verification and nitpick comments (1)
app/domain/chat/chat_pipeline_execution_dto.py (1)

5-5: Impact of Removing settings and initial_stages:

The removal of settings and initial_stages from the DTO affects several parts of the application. These attributes are referenced in multiple files, indicating their active role in the application's logic. Here are some key locations where they are used:

  • app/pipeline/lecture_ingestion_pipeline.py
  • app/pipeline/chat/lecture_chat_pipeline.py
  • app/web/status/status_update.py
  • app/web/routers/pipelines.py
  • app/web/routers/webhooks.py

Please ensure these dependencies are addressed to avoid breaking changes.

Analysis chain

Verify the impact of removing settings and initial_stages.

The removal of these attributes simplifies the DTO, but ensure that their absence does not affect other parts of the application.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of `settings` and `initial_stages` in the codebase.

# Test: Search for any references to the removed attributes. Expect: No references.
rg --type python 'settings|initial_stages'

Length of output: 75


Script:

#!/bin/bash
# Corrected script to search for usage of `settings` and `initial_stages` in Python files.

# Search for any references to the removed attributes in Python files.
rg 'settings|initial_stages' --glob '*.py' -A 3

Length of output: 9614

app/domain/__init__.py Show resolved Hide resolved
app/pipeline/__init__.py Show resolved Hide resolved
app/domain/data/competency_dto.py Outdated Show resolved Hide resolved
app/pipeline/competency_extraction_pipeline.py Outdated Show resolved Hide resolved
app/pipeline/competency_extraction_pipeline.py Outdated Show resolved Hide resolved
app/web/routers/pipelines.py Outdated Show resolved Hide resolved
app/web/status/status_update.py Outdated Show resolved Hide resolved
coderabbitai[bot]
coderabbitai bot previously approved these changes Aug 9, 2024
coderabbitai[bot]
coderabbitai bot previously approved these changes Aug 9, 2024
coderabbitai[bot]
coderabbitai bot previously approved these changes Aug 9, 2024
coderabbitai[bot]
coderabbitai bot previously approved these changes Aug 21, 2024
Copy link
Member

@bassner bassner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM

@bassner bassner merged commit 5765c9c into main Aug 27, 2024
5 checks passed
@bassner bassner deleted the feature/competency-extraction branch August 27, 2024 11:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants