Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test suite v0.4 #637

Merged
merged 46 commits into from
Jan 15, 2025
Merged

Test suite v0.4 #637

merged 46 commits into from
Jan 15, 2025

Conversation

teocns
Copy link
Contributor

@teocns teocns commented Jan 10, 2025

Overview

This PR reorganizes the test infrastructure by implementing a comprehensive testing strategy that separates unit and integration tests, with a focus on provider integrations and API functionality testing.

Infrastructure Changes

  • Split tests into dedicated unit/ and integration/ directories
  • Configure separate CI jobs for unit and integration tests
  • Add VCR.py for HTTP interaction recording/replay
  • Centralize test fixtures in conftest.py
  • Add development tools (pytest-sugar, pdbpp)
  • Update pytest configuration for optimal test isolation
  • Set up proper test timeouts (5 minutes)

Test Fixtures

  • Implement JWT token management
  • Add provider response spying capabilities
  • Centralize mock_req fixture
  • Add session management utilities
  • Add package availability control
  • Set up VCR ignore hosts and options

Core Functionality Tests

  • Improve session handling and teardown
  • Enhance async loop lifecycle management
  • Add concurrent API request handling tests
  • Implement proper singleton cleanup
  • Move telemetry tests to unit test directory

Provider Integration Tests

OpenAI

  • Basic sync/async completions
  • Streaming responses
  • Assistants API integration

Mistral

  • Sync completions
  • Async completions
  • Streaming responses

Cohere

  • Chat completions (sync)
  • Chat completions (async)
  • Stream handling
  • Instrumentation control

AI21

  • Sync completions
  • Async completions
  • Stream management

Groq

  • Basic completions
  • Session management
  • Stream handling

LlamaStack

  • Agent configuration
  • Shield management
  • Model selection

CrewAI Integration Tests

  • Basic Setup
    • Test initialization
    • Test session creation
    • Test auto-end behavior
  • Session Management
    • Test crew lifecycle
    • Test task completion tracking
    • Test manual session control
  • Agent Monitoring
    • Test single agent tracking
    • Test multi-agent tracking
    • Test task delegation tracking
  • Tool Integration
    • Test built-in tools
    • Test custom tools
    • Test tool error handling
  • Example Workflows
    • Test job posting flow
    • Test markdown validator
    • Test Instagram post creation

API Server Tests

  • Session lifecycle in API context
  • Tool recording
  • Event validation
  • Multi-session handling

Documentation

  • Update CONTRIBUTING.md with new test structure
  • Document VCR usage and best practices
  • Add provider test documentation

Integration

  • Adapt and integrate tests from PR Add LLM Integration Tests #603
  • Update LangChain handler tests for current API
  • Implement remaining provider integration tests

CrewAI Integration Tests

Basic Integration Tests

  • Test initialization with CrewAI
    • Verify AgentOps initialization before Crew constructor
    • Test auto_start_session behavior
    • Test skip_auto_end_session parameter
    • Verify proper session creation

Session Management Tests

  • Test session lifecycle with CrewAI
    • Verify session starts with Crew initialization
    • Test automatic session ending when tasks complete
    • Verify session state after crew.kickoff()
    • Test manual session ending

Event Recording Tests

  • Test LLM event recording
    • Verify LLM calls are properly tracked
    • Test streaming responses tracking
    • Verify async LLM calls tracking
    • Test multiple LLM calls within one session

Multi-Agent Tests

  • Test multi-agent scenarios
    • Verify each agent's actions are tracked
    • Test inter-agent communication tracking
    • Verify task delegation and completion tracking
    • Test parallel agent execution monitoring

Tool Usage Tests

  • Test tool integration
    • Verify custom tool execution tracking
    • Test built-in tool usage monitoring
    • Verify tool error handling
    • Test tool result recording

Example Workflow Tests

  • Test job posting workflow
    • Verify researcher agent tracking
    • Test writer agent monitoring
    • Verify review agent tracking
    • Test complete workflow execution

Error Handling Tests

  • Test error scenarios
    • Verify failed task tracking
    • Test exception handling
    • Verify session state after errors
    • Test recovery mechanisms

Integration with Other Tools

  • Test CrewAI with other integrations
    • Test OpenAI provider integration
    • Verify LangChain compatibility
    • Test custom tool implementations
    • Verify third-party tool usage

Special Features Tests

  • Test CrewAI-specific features
    • Verify task dependency tracking
    • Test sequential vs parallel execution
    • Verify agent role assignments
    • Test custom agent configurations

Known Issues & Mitigations

  • Test timing out when running integration and unit tests together
    • Solution: Separated test runs in CI
  • Complex patching layers management
    • Solution: Centralized mock configuration
  • VCR initialization conflicts
    • Solution: Scoped VCR config to session level

@teocns teocns force-pushed the feat/optimal-test-suite branch 3 times, most recently from a086fa0 to 9044b65 Compare January 10, 2025 21:29
Copy link

codecov bot commented Jan 10, 2025

Codecov Report

Attention: Patch coverage is 2.43902% with 40 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
agentops/llms/providers/openai.py 2.43% 40 Missing ⚠️

📢 Thoughts on this report? Let us know!

@teocns
Copy link
Contributor Author

teocns commented Jan 11, 2025

Some funny behavior with what I believe is the async loop lifecycle management. Tests time out, or maybe there is one particular test making the testing suite time out. Running each individually work, so far. Tried different combos of async event loop configuration without luck

@teocns
Copy link
Contributor Author

teocns commented Jan 11, 2025

I found out the issue is hit only when vcr is initialized (i.e imported)

Tests keep going until a vcr replay kicks in

I think this has to do with Async httpx

@teocns teocns force-pushed the feat/optimal-test-suite branch 2 times, most recently from 4e51bb1 to 0650e92 Compare January 12, 2025 00:23
@teocns teocns force-pushed the feat/optimal-test-suite branch from 86205ba to 6dbe54b Compare January 12, 2025 02:12
@teocns teocns force-pushed the feat/optimal-test-suite branch from a606ea6 to fb2be21 Compare January 12, 2025 02:34
@the-praxs
Copy link
Member

So far we have the integration tests for all providers except Llama Stack and the partners (crewAI, Autogen, TaskWeaver, LangChain).

I have Llama Stack configured with Fireworks to make it work on my local machine but that's unreliable. I need to make it work to get the cassette recordings.

Partner integration tests I am working on the comprehensive suite as mentioned above.

the-praxs and others added 6 commits January 14, 2025 16:52
…s found

all provider fixtures will:
Use the actual API key if it's set in the environment
Fall back to "test-api-key" if no environment variable is found

Signed-off-by: Teo <[email protected]>
@teocns teocns force-pushed the feat/optimal-test-suite branch from dc72575 to 0a8a5f7 Compare January 14, 2025 23:08
@teocns teocns force-pushed the feat/optimal-test-suite branch 2 times, most recently from 6b9e3f0 to 8f1a958 Compare January 15, 2025 09:12
@the-praxs the-praxs self-requested a review January 15, 2025 13:10
@teocns teocns force-pushed the feat/optimal-test-suite branch 3 times, most recently from 6a7ff37 to 29c29d4 Compare January 15, 2025 13:41
@teocns teocns force-pushed the feat/optimal-test-suite branch from 29c29d4 to 98c325c Compare January 15, 2025 13:47
@teocns teocns merged commit ae0f11b into main Jan 15, 2025
9 of 10 checks passed
@teocns teocns deleted the feat/optimal-test-suite branch January 15, 2025 14:15
teocns added a commit that referenced this pull request Jan 15, 2025
@teocns teocns added the v0.4 label Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants