V2: Improve AI review action (#10)

- Added tools - function calling. - Added support for dynamic grading. --------- Co-authored-by: Bodhish Thomas <[email protected]>
pupilfirst · Mar 14, 2024 · 386743f · 386743f
1 parent 3081628
commit 386743f
Show file tree

Hide file tree

Showing 7 changed files with 246 additions and 133 deletions.
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -13,36 +13,36 @@ env:
   TEST_MODE: true
   WORKFLOW_FILE_PATH: ./.github/workflows/test.yml
 jobs:
-  test: # make sure the action works on a clean machine without building
+  test:
     runs-on: ubuntu-latest
     steps:
       - uses: actions/checkout@v4
       - name: AI auto review
         id: ai-review
         uses: ./
-        with:
-          ROLE_PROMPT: "You are an advanced English Language Teaching Assistant AI. Your task involves reviewing and providing feedback on student submissions, paying meticulous attention to grammar, punctuation, and style errors."
+        env:
+          ROLE_PROMPT: "You are an advanced English Language Teaching Assistant AI. Your task involves grading student submission, paying meticulous attention to grammar, punctuation, and style errors."
           USER_PROMPT: |
-            The conversation should include the following:
+            The submission is on Writing up a conversation between student and an instructor at Pupilfirst with at least 100 words.
             - The specific Discord channel the conversation takes place in.
             - The initial question, marked with "Student: ", outlining the student's doubt.
             - The instructor's response, labelled with "Instructor: ", that provides a solution.
             - A follow-up question for clarification, again starting with "Student: ", to delve into what the instructor meant.
 
-            Ensure that the student applies the lessons they learned in the current level:
-            - Provide context, steps taken, and error messages for both the initial question and the follow-up.
-            - Frame questions around the "why" and "how" aspects.
-            - Ask for additional examples, if necessary.
-            - Thank the instructor in a proper and considerate manner.
-
-            The feedback should focus on the following areas (with the ideal condition in brackets):
-            1. Providing Context & Background (The student delivers clear and detailed context, steps taken, and error messages).
-            2. Clarity (The conversation is clear and easy to understand throughout).
-            3. Expressing Thanks (The student thanks the instructor genuinely and appropriately).
-            4. Appropriate Tone & Etiquette (The student maintains a professional and respectful tone throughout the conversation).
+            When looking at the student's submission, you should check for the following requirements:
+            - Provided the context, steps taken, and error messages for both the initial question and the follow-up.
+            - Framed the questions around the "why" and "how" aspects.
+            - Asked for additional examples, if necessary.
+            - Wished the instructor in a proper and considerate manner.
 
             Make sure to identify and highlight all grammar, punctuation, and style errors.
 
-            The student's submission will be as follows:
+            As per the above requirements, add a grading to your feedback.
+            Choose status as "accepted" grade of the submission as "1" for evaluation_criteria_id 3361 when
+             - The submission meets the ideal conditions in all areas.
+
+            Choose the status as "rejected" and send empty [] for grades property when
+             - The submission does not meet the ideal conditions in all areas.
 
-            ${SUBMISSION}
+            The student's submission is
+             ${SUBMISSION}
diff --git a/README.md b/README.md
@@ -22,15 +22,16 @@ The application uses the following environment variables for configuration:
 10. `REVIEW_END_POINT`: This environment variable specifies the URL of the endpoint where the reviews are sent.
 11. `REVIEW_BOT_USER_TOKEN`: This environment variable represents the token used for authorization when sending the reviews.
 12. `WORKFLOW_FILE_PATH`: The path to your GitHub Actions workflow file. Default value is `.github/workflows/ci.js.yml`. Update this if you use a different path or file name for your workflow.
-13. `SKIP_GRADING`: If set to `true`, the action will only create a feedback in the LMS and not send a review to the review endpoint. Default value is `false`.
 
-> Note: You need to specify USER_PROMPT and ROLE_PROMPT mandatorily unless you provide a SYSTEM_PROMPT.
+> [!NOTE]
+> You need to specify USER_PROMPT and ROLE_PROMPT mandatorily unless you provide a SYSTEM_PROMPT.
 
 ## How to Set Environment Variables
 
 In GitHub Actions, you can set environment variables for a specific step in your workflow file (.github/workflows/workflow.yml). Here's an example:
 
-> Note: Use `|` (Literal Block Scalar) intsead of `>` (Folded Block Scalar) when writing prompts spanning multiple lines (see `USER_PROMPT` in the example below).
+> [!CAUTION]
+> Use `|` (Literal Block Scalar) intsead of `>` (Folded Block Scalar) when writing prompts spanning multiple lines (see `USER_PROMPT` in the example below).
 
 ```yaml
 name: "English Language Course L1 | Auto Grade"
@@ -57,6 +58,7 @@ jobs:
         id: ai-review
         uses: pupilfirst/ai-review-action@v1
         env:
+          OPEN_AI_MODEL: gpt-4-turbo-preview
           ROLE_PROMPT: "You are an advanced English Language Teaching Assistant AI. Your task involves reviewing and providing feedback on student submissions, paying meticulous attention to grammar, punctuation, and style errors."
           USER_PROMPT: |
             The conversation should include the following:

diff --git a/app/open_ai_client.rb b/app/open_ai_client.rb
@@ -1,31 +1,35 @@
-require 'openai'
-require 'yaml'
+require "openai"
+require "yaml"
+require "json"
 
 class OpenAIClient
   def initialize
     @client = OpenAI::Client.new
 
     @config = extract_relevant_step_configuration
-    @model = @config.fetch('OPEN_AI_MODEL', "gpt-3.5-turbo")
-    @temperature = @config.fetch('OPEN_AI_TEMPERATURE', 0.1).to_f
-    @system_prompt = @config.fetch('SYSTEM_PROMPT', system_prompt_default)
+    @model = @config.fetch("OPEN_AI_MODEL", "gpt-3.5-turbo")
+    @temperature = @config.fetch("OPEN_AI_TEMPERATURE", 0.1).to_f
+    @system_prompt = @config.fetch("SYSTEM_PROMPT", system_prompt_default)
+
+    @submission = Submission.new
+    @reviewer = Reviewer.new(@submission)
   end
 
   def extract_relevant_step_configuration
     # Load workflow YAML file from the path specified in the environment variable or the default path.
-    file_path = ENV.fetch('WORKFLOW_FILE_PATH', './.github/workflows/ci.js.yml')
+    file_path = ENV.fetch("WORKFLOW_FILE_PATH", "./.github/workflows/ci.js.yml")
 
     # Find the job step that uses 'pupilfirst/ai-review-action' or has an ID containing 'ai-review'.
-    content = YAML.safe_load(File.read(file_path))
+    content = YAML.safe_load_file(file_path)
 
-    @config = content.dig('jobs', 'test', 'steps').find do |step|
-      ( step['uses']&.include?('pupilfirst/ai-review-action') || step['id']&.include?('ai-review') )
-    end['env']
+    @config = content.dig("jobs", "test", "steps").find do |step|
+      (step["uses"]&.include?("pupilfirst/ai-review-action") || step["id"]&.include?("ai-review"))
+    end["env"]
 
     if @config.nil?
       p content
 
-      raise 'Could not read configuration from environment variables. Please check the workflow file.'
+      raise "Could not read configuration from environment variables. Please check the workflow file."
     end
 
     @config
@@ -34,78 +38,98 @@ def extract_relevant_step_configuration
   def ask
     puts prompt
     response = @client.chat(
-        parameters: {
-            model: @model,
-            messages: [
-                { role: "system", content: prompt }
-            ],
-            temperature: @temperature,
-        })
+      parameters: {
+        model: @model,
+        messages: [
+          {role: "system", content: prompt}
+        ],
+        tools: @reviewer.available_tools,
+        tool_choice: @reviewer.tool_choice,
+        temperature: @temperature
+      }
+    )
     puts response
-    response.dig("choices", 0, "message", "content")
+
+    message = response.dig("choices", 0, "message")
+    if message["role"] == "assistant" && message["tool_calls"]
+      message["tool_calls"].each do |tool_call|
+        function_name = tool_call.dig("function", "name")
+        args_json = tool_call.dig("function", "arguments")
+        begin
+          args = JSON.parse(args_json, symbolize_names: true)
+          return {function_name: function_name, args: args}
+        rescue JSON::ParserError => e
+          puts "Error parsing JSON arguments: #{e.message}"
+        end
+      end
+    else
+      {function_name: "errored", args: {}}
+    end
   end
 
   def prompt
     @system_prompt
-    .gsub("${ROLE_PROMPT}", default_role_prompt)
-    .gsub("${INPUT_DESCRIPTION}", default_input_prompt)
-    .gsub("${USER_PROMPT}", default_user_prompt)
-    .gsub("${SUBMISSION}", "#{Submission.new.checklist}")
-    .gsub("${OUTPUT_DESCRIPTION}", default_output_prompt)
+      .gsub("${ROLE_PROMPT}", default_role_prompt)
+      .gsub("${INPUT_DESCRIPTION}", default_input_prompt)
+      .gsub("${USER_PROMPT}", default_user_prompt)
+      .gsub("${SUBMISSION}", "#{@submission.checklist}")
+      .gsub("${EC_PROMPT}", default_evaluation_criteria_prompt)
+      .gsub("${SUBMISSION_EC}", "#{@submission.evaluation_criteria}")
   end
 
   def system_prompt_default
-<<-SYSTEM_PROMPT
-#{@config.fetch("ROLE_PROMPT", "${ROLE_PROMPT}")}
+    <<~SYSTEM_PROMPT
+      #{@config.fetch("ROLE_PROMPT", "${ROLE_PROMPT}")}
 
-#{@config.fetch("INPUT_DESCRIPTION", "${INPUT_DESCRIPTION}")}
+      #{@config.fetch("INPUT_DESCRIPTION", "${INPUT_DESCRIPTION}")}
 
-#{@config.fetch("USER_PROMPT", "${USER_PROMPT}")}
+      #{@config.fetch("USER_PROMPT", "${USER_PROMPT}")}
 
-#{@config.fetch("OUTPUT_DESCRIPTION", "${OUTPUT_DESCRIPTION}")}
-SYSTEM_PROMPT
+      #{@config.fetch("EC_PROMPT", "${EC_PROMPT}")}
+    SYSTEM_PROMPT
   end
 
   def default_role_prompt
-<<-ROLE_PROMPT
-You are an advanced Teaching Assistant AI. Your task involves reviewing and providing feedback on student submissions.
-ROLE_PROMPT
+    <<~ROLE_PROMPT
+      You are an advanced Teaching Assistant AI. Your task involves reviewing and providing feedback on student submissions.
+    ROLE_PROMPT
   end
 
   def default_user_prompt
-<<-USER_PROMPT
-The student's submission will be as follows:
-${SUBMISSION}
-USER_PROMPT
+    <<~USER_PROMPT
+      The student's submission will be as follows:
+      ${SUBMISSION}
+    USER_PROMPT
   end
 
   def default_input_prompt
-<<-INPUT_PROMPT
-The student's submissions will be an array of objects following the provided schema:
-
-```json
-{
-  "kind": "The type of answer - can be shortText, longText, link, files, or multiChoice",
-  "title": "The question that was asked of the student",
-  "result": "The student's response",
-  "status": "Field for internal use; ignore this field during your review"
-}
-```
-INPUT_PROMPT
-  end
+    <<~INPUT_PROMPT
+      The student's submissions will be an array of objects following the provided schema:
 
-  def default_output_prompt
-<<-OUTPUT_PROMPT
-Please provide your response in the following JSON format. Adhere to the format strictly and escape all line-breaks within strings using \\\\n.
+      {
+        "kind": "The type of answer - can be shortText, longText, link, files, or multiChoice",
+        "title": "The question that was asked of the student",
+        "result": "The student's response",
+        "status": "Field for internal use; ignore this field during your review"
+      }
 
-```json
-{
-    "status": "\"passed\" or \"failed\"",
-    "feedback": "Detailed feedback for the student in markdown format. Aim for a human-like explanation as much as possible."
-}
-```
+    INPUT_PROMPT
+  end
 
-If the student submission is not related to question, share generic feedback.
-OUTPUT_PROMPT
+  def default_evaluation_criteria_prompt
+    if @submission.evaluation_criteria.any?
+      <<~EC_PROMPT
+        The following describes an array of objects where each object represents an evaluation criterion for a submission. Each criterion object includes the following key attributes:
+          - id: This key stores the identifier for the evaluation criteria, which can be either a numeric value or a string.
+          - name: The name of the evaluation criterion, describing the aspect of the submission it assesses.
+          - max_grade: The maximum grade that can be assigned for this criterion.
+          - grade_labels: An array of objects, each containing a 'grade' and a 'label'. 'grade' is an integer representing a possible grade for the criterion, and 'label' is a description of what this grade signifies.
+
+          Below is the structured representation of the evaluation criteria for the current submission:
+            ${SUBMISSION_EC}
+      EC_PROMPT
+    else
+      ""
+    end
   end
 end
diff --git a/app/pupilfirst_api.rb b/app/pupilfirst_api.rb
@@ -1,32 +1,31 @@
-require 'json'
-require 'graphql/client'
-require 'graphql/client/http'
-require_relative 'submission'
+require "json"
+require "graphql/client"
+require "graphql/client/http"
+require_relative "submission"
 
 # Pupilfirst API example wrapper
 module PupilfirstAPI
-
   module API
-    HTTP = GraphQL::Client::HTTP.new(ENV.fetch('REVIEW_END_POINT')) do
+    HTTP = GraphQL::Client::HTTP.new(ENV.fetch("REVIEW_END_POINT")) do
       def headers(_context)
-        { "Authorization": "Bearer #{ENV.fetch('REVIEW_BOT_USER_TOKEN')}" }
+        {Authorization: "Bearer #{ENV.fetch("REVIEW_BOT_USER_TOKEN")}"}
       end
     end
 
-    Schema = GraphQL::Client.load_schema('/app/graphql_schema.json')
+    Schema = GraphQL::Client.load_schema("/app/graphql_schema.json")
 
     Client = GraphQL::Client.new(schema: Schema, execute: HTTP)
   end
 
-  GradeMutation = API::Client.parse <<-'GRAPHQL'
+  GradeMutation = API::Client.parse <<-GRAPHQL
     mutation($submissionId: ID!, $grades: [GradeInput!], $checklist: JSON!, $feedback: String) {
       createGrading(submissionId: $submissionId, grades: $grades, checklist: $checklist, feedback: $feedback) {
         success
       }
     }
   GRAPHQL
 
-  CreateFeedbackMutation = API::Client.parse <<-'GRAPHQL'
+  CreateFeedbackMutation = API::Client.parse <<-GRAPHQL
     mutation($submissionId: ID!, $feedback: String!) {
       createFeedback(submissionId: $submissionId, feedback: $feedback) {
         success
@@ -37,56 +36,52 @@ def headers(_context)
   class Grader
     def initialize(submission = Submission.new)
       @submission = submission
-      @test_mode = ENV.fetch('TEST_MODE', 'false') == 'true'
+      @test_mode = ENV.fetch("TEST_MODE", "false") == "true"
     end
 
     def grade(result)
-      return puts "Unknown status: #{result['status'].inspect}. Skipping grading..." unless valid_status?(result['status'])
+      return puts "Unknown status: #{result[:status].inspect}. Skipping grading..." unless valid_status?(result[:status])
 
       variables = {
         submissionId: @submission.id,
         checklist: @submission.checklist,
-        feedback: result['feedback']
+        feedback: result[:feedback]
       }
 
-      grades = grades_based_on(result['status'])
+      # We can use the value of the result[:grades] but we are using following method to handle the case when model hallucinates the grades for a rejected submission.
+      grades = grades_based_on(result)
 
       variables[:grades] = grades if grades.length > 0
 
       log_variables(variables) if @test_mode
       create_grading(variables) unless @test_mode
-    rescue StandardError => e
+    rescue => e
       handle_error(e)
     end
 
     def add_feedback(result)
       variables = {
         submissionId: @submission.id,
-        feedback: result['feedback']
+        feedback: result[:feedback]
       }
 
       log_variables(variables) if @test_mode
       create_feedback(variables) unless @test_mode
-    rescue StandardError => e
+    rescue => e
       handle_error(e)
     end
 
     private
 
     def valid_status?(status)
-      %w[passed failed].include?(status)
+      %w[accepted rejected].include?(status)
     end
 
-    def grades_based_on(status)
-      if status == 'passed'
-        return @submission.evaluation_criteria.map do |criteria|
-          {
-            evaluationCriterionId: criteria['id'],
-            grade: criteria['max_grade']
-          }
-        end
+    def grades_based_on(result)
+      if result[:status] == "accepted"
+        result[:grades]
       else
-        return []
+        []
       end
     end