Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crustdata linkedin enrichment #2678

Merged
merged 16 commits into from
Nov 7, 2024
Merged

Conversation

epipav
Copy link
Collaborator

@epipav epipav commented Nov 6, 2024

Changes proposed ✍️

What

copilot:summary

copilot:poem

Why

How

copilot:walkthrough

Checklist ✅

  • Label appropriately with Feature, Improvement, or Bug.
  • Add screenshots to the PR description for relevant FE changes
  • New backend functionality has been unit-tested.
  • API documentation has been updated (if necessary) (see docs on API documentation).
  • Quality standards are met.

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a new member enrichment source, CRUSTDATA, allowing for enhanced member data enrichment capabilities.
    • Added support for enriching member data using the Crustdata API, including new attributes like skills and schools.
    • Expanded member attributes to include SCHOOLS for more comprehensive data.
    • New environment variables added for Crustdata API integration.
    • Increased the number of members processed per run from 100 to 500.
  • Updates

    • Expanded existing member enrichment types to accommodate new data structures and sources.
    • Enhanced the data retrieval process from LinkedIn to improve efficiency and accuracy.
  • Bug Fixes

    • Improved error handling and logging for API interactions with the Crustdata service.

Copy link

coderabbitai bot commented Nov 6, 2024

Warning

Rate limit exceeded

@epipav has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 7 minutes and 26 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 033f52c and 5f2bdda.

Walkthrough

The changes introduced in this pull request enhance the member enrichment functionality by adding a new enrichment service, EnrichmentServiceCrustdata, and integrating it into the EnrichmentSourceServiceFactory. This includes updating the factory's method to handle a new enrichment source, MemberEnrichmentSource.CRUSTDATA. Additionally, new TypeScript interfaces for handling enrichment data are created, and existing services are modified to accommodate these updates, including adjustments to the IMemberEnrichmentDataProgAILinkedinScraper interface and the addition of new enum values.

Changes

File Change Summary
services/apps/premium/members_enrichment_worker/src/factory.ts Added EnrichmentServiceCrustdata to EnrichmentSourceServiceFactory. Updated getEnrichmentSourceService method to handle MemberEnrichmentSource.CRUSTDATA.
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts Introduced EnrichmentServiceCrustdata class implementing IEnrichmentService for enriching member data using Crustdata API.
services/apps/premium/members_enrichment_worker/src/sources/crustdata/types.ts Added interfaces: IMemberEnrichmentCrustdataEmployer, IMemberEnrichmentDataCrustdata, and IMemberEnrichmentCrustdataRemainingCredits.
services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts Renamed method findConsumableLinkedinIdentities to getData. Added getDataUsingLinkedinHandle method.
services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/types.ts Updated metadata property type in IMemberEnrichmentDataProgAILinkedinScraper.
services/apps/premium/members_enrichment_worker/src/types.ts Added new types and updated existing interfaces for member enrichment data, including IMemberEnrichmentMetadataNormalized.
services/apps/premium/members_enrichment_worker/src/workflows/getMembersToEnrich.ts Added MemberEnrichmentSource.CRUSTDATA to the sources array in getMembersToEnrich.
services/libs/types/src/enums/enrichment.ts Added CRUSTDATA member to MemberEnrichmentSource enum.
services/libs/types/src/enums/members.ts Added SCHOOLS entry to MemberAttributeName enum and updated MemberAttributes object.
services/libs/types/src/enums/organizations.ts Added ENRICHMENT_CRUSTDATA to OrganizationSource enum.
services/apps/premium/members_enrichment_worker/src/main.ts Added environment variables CROWD_ENRICHMENT_CRUSTDATA_URL and CROWD_ENRICHMENT_CRUSTDATA_API_KEY.

Possibly related PRs

Suggested labels

Improvement

🐰 In the fields where data grows,
A new source for members flows.
Crustdata adds a vibrant hue,
Enriching lives with insights new!
With types and enums, all in place,
We hop along, a joyful race! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@epipav epipav self-assigned this Nov 6, 2024
@epipav epipav added the Feature Created by Linear-GitHub Sync label Nov 6, 2024
@epipav epipav requested a review from themarolt November 6, 2024 15:12
@epipav epipav marked this pull request as ready for review November 6, 2024 15:12
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Outside diff range and nitpick comments (8)
services/libs/common/src/strings.ts (2)

10-12: Consider enhancing email validation robustness.

While the current implementation covers basic email validation, consider these improvements for production use:

  1. Add input null/undefined checks
  2. Add maximum length validation (RFC 5321 limits to 254 chars)
  3. Use a more strict regex that validates:
    • TLD length (2-63 chars)
    • Valid special characters
    • Maximum label length (63 chars)
 export const isValidEmail = (email: string): boolean => {
+  if (!email) return false;
+  if (email.length > 254) return false;
+  const validEmail = email.trim();
-  return /^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(email.trim())
+  return /^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/.test(validEmail)
 }

10-12: Add unit tests for email validation.

Given that this function will be used for member enrichment, it's critical to have comprehensive test coverage. Please add unit tests covering:

  • Valid email formats
  • Invalid formats
  • Edge cases (empty string, whitespace, special characters)
  • Maximum length scenarios
  • Different TLD formats

Would you like me to help generate a comprehensive test suite for this function?

services/apps/premium/members_enrichment_worker/src/sources/crustdata/types.ts (2)

3-14: Consider improving type safety for date fields and ID structure.

A few suggestions to enhance type safety and data integrity:

  1. The date fields should use a more specific type to ensure proper date format.
  2. The employer_company_id as an array of numbers is unusual for an ID field.

Consider these improvements:

 export interface IMemberEnrichmentCrustdataEmployer {
   employer_name: string
   employer_linkedin_id: string
   employer_linkedin_description?: string
-  employer_company_id: number[]
+  employer_company_id: number  // or string if it's an external ID
   employee_position_id?: number
   employee_title: string
   employee_description?: string
   employee_location: string
-  start_date: string
-  end_date: string
+  start_date: Date | string  // or use a specific date format type
+  end_date: Date | string    // or use a specific date format type
 }

40-42: Add constraints and documentation for credits field.

The credits field should be non-negative and its purpose should be documented.

Consider adding JSDoc documentation and type constraints:

+/**
+ * Represents the remaining API credits for Crustdata enrichment service.
+ * Credits are consumed for each API request.
+ */
 export interface IMemberEnrichmentCrustdataRemainingCredits {
-  credits: number
+  credits: number & { readonly brand: unique symbol }  // Use branded type for non-negative
 }
services/apps/premium/members_enrichment_worker/src/types.ts (2)

64-64: Consider a more flexible metadata type structure.

The current type alias IMemberEnrichmentMetadataNormalized = IMemberEnrichmentLinkedinScraperMetadata tightly couples normalized metadata to LinkedIn scraper metadata. Consider using a union type or a more generic interface to accommodate different metadata structures from various sources.

export type IMemberEnrichmentMetadataNormalized =
  | IMemberEnrichmentLinkedinScraperMetadata
  | IMemberEnrichmentCrustdataMetadata
  | Record<string, unknown>;

86-89: Add JSDoc documentation for the new metadata interface.

While the properties are well-named, adding JSDoc comments would help clarify:

  • The purpose and significance of each field
  • The expected range of values for repeatedTimesInDifferentSources
  • The criteria for isFromVerifiedSource
/**
 * Metadata for LinkedIn scraper enrichment results
 */
export interface IMemberEnrichmentLinkedinScraperMetadata {
  /** Number of times this member data appeared in different sources */
  repeatedTimesInDifferentSources: number;
  /** Indicates if the data comes from a verified LinkedIn profile */
  isFromVerifiedSource: boolean;
}
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (2)

96-96: Use optional chaining to simplify null checks

The condition input.linkedin && input.linkedin.value && input.linkedin.verified can be simplified using optional chaining for better readability.

Apply this diff:

-            (input.linkedin && input.linkedin.value && input.linkedin.verified))
+            (input.linkedin?.value && input.linkedin.verified))
🧰 Tools
🪛 Biome

[error] 96-96: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)


201-201: Use optional chaining to simplify null checks

Similarly, the condition input.linkedin && input.linkedin.value && input.linkedin.verified can be simplified using optional chaining.

Apply this diff:

-        if (input.linkedin && input.linkedin.value && input.linkedin.verified) {
+        if (input.linkedin?.value && input.linkedin.verified) {
🧰 Tools
🪛 Biome

[error] 201-201: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1244e64 and e53c463.

📒 Files selected for processing (11)
  • services/apps/premium/members_enrichment_worker/src/factory.ts (2 hunks)
  • services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (1 hunks)
  • services/apps/premium/members_enrichment_worker/src/sources/crustdata/types.ts (1 hunks)
  • services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts (1 hunks)
  • services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/types.ts (1 hunks)
  • services/apps/premium/members_enrichment_worker/src/types.ts (4 hunks)
  • services/apps/premium/members_enrichment_worker/src/workflows/getMembersToEnrich.ts (1 hunks)
  • services/libs/common/src/strings.ts (1 hunks)
  • services/libs/types/src/enums/enrichment.ts (1 hunks)
  • services/libs/types/src/enums/members.ts (1 hunks)
  • services/libs/types/src/enums/organizations.ts (1 hunks)
🧰 Additional context used
🪛 Biome
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts

[error] 96-96: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)


[error] 201-201: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts

[error] 90-91: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

🔇 Additional comments (17)
services/libs/types/src/enums/enrichment.ts (1)

6-6: LGTM! The enum addition follows the established patterns.

The new CRUSTDATA member maintains consistency with existing naming conventions and integration patterns.

Let's verify the integration points mentioned in the summary:

✅ Verification successful

Integration points for CRUSTDATA enum are properly implemented

All expected integration points are confirmed:

  • Service implementation exists in sources/crustdata/service.ts
  • Factory registration is present in factory.ts
  • Workflow integration found in getMembersToEnrich.ts
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the integration of CRUSTDATA enum member
# Expected results:
# 1. EnrichmentServiceCrustdata class implementation
# 2. Factory registration
# 3. Usage in workflow

# Test 1: Verify service implementation
echo "Checking for EnrichmentServiceCrustdata implementation..."
rg -l "class EnrichmentServiceCrustdata" "services/apps/premium/members_enrichment_worker/src"

# Test 2: Verify factory registration
echo "Checking for factory registration..."
rg "case\s+MemberEnrichmentSource.CRUSTDATA" "services/apps/premium/members_enrichment_worker/src/factory.ts"

# Test 3: Verify workflow integration
echo "Checking for workflow integration..."
rg "MemberEnrichmentSource.CRUSTDATA" "services/apps/premium/members_enrichment_worker/src/workflows"

Length of output: 843

services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/types.ts (2)

1-1: LGTM! Good practice moving the type to a common location.

The import of IMemberEnrichmentLinkedinScraperMetadata from a shared types file promotes reusability and consistency across different enrichment sources.


5-5: Verify the impact of metadata type change.

The change from an inline metadata type to IMemberEnrichmentLinkedinScraperMetadata could affect existing code. Let's verify all usages are updated correctly.

✅ Verification successful

Let me try another search strategy to verify the impact of this type change.


Type change impact verification completed - no issues found

The type IMemberEnrichmentLinkedinScraperMetadata is properly defined in types.ts and all its usages are consistent:

  • The type is defined in services/apps/premium/members_enrichment_worker/src/types.ts
  • It's correctly imported and used in both progai-linkedin-scraper and crustdata sources
  • The metadata field assignments in both service.ts files maintain type compatibility
  • No type casting issues or inconsistent usage patterns were found
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for usages of IMemberEnrichmentDataProgAILinkedinScraper
# and verify they handle the metadata field correctly

# Find direct usages of the interface
echo "Searching for direct interface usage..."
rg -l "IMemberEnrichmentDataProgAILinkedinScraper" --type typescript

# Find potential metadata field access patterns
echo -e "\nSearching for metadata field access patterns..."
rg "\.metadata\." --type typescript -C 3

# Find type casts or assignments
echo -e "\nSearching for type casts or assignments..."
ast-grep --pattern 'as IMemberEnrichmentDataProgAILinkedinScraper'

Length of output: 551


Script:

#!/bin/bash
# Search for files containing the interface name and metadata references
echo "Searching for interface and metadata usage..."
rg -p "IMemberEnrichmentLinkedinScraperMetadata|\.metadata" 

# Search for type definition
echo -e "\nSearching for type definition..."
rg -p "interface.*IMemberEnrichmentLinkedinScraperMetadata" -A 5

# Look for any files in the specific directory
echo -e "\nListing relevant files in the directory..."
find services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/ -type f -name "*.ts"

Length of output: 2911

services/libs/types/src/enums/organizations.ts (1)

12-12: LGTM! The enum addition follows conventions.

The new ENRICHMENT_CRUSTDATA value follows the established naming patterns and is correctly placed among other enrichment sources.

Let's verify consistency with related enums:

✅ Verification successful

Verified: The enum addition is consistent with related enums

The verification confirms that CRUSTDATA enum exists with the value 'crustdata' in the MemberEnrichmentSource enum, maintaining consistency across the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that the corresponding enum exists in enrichment.ts
# and follows the same naming convention

# Test: Search for the corresponding CRUSTDATA enum in enrichment.ts
# Expect: MemberEnrichmentSource.CRUSTDATA with value 'crustdata'
rg -l "enum.*MemberEnrichmentSource" | xargs rg "CRUSTDATA.*=.*'crustdata'"

Length of output: 110

services/apps/premium/members_enrichment_worker/src/sources/crustdata/types.ts (2)

1-1: LGTM!

The import statement is correctly defined and the imported type is properly utilized in the IMemberEnrichmentDataCrustdata interface.


1-42: Verify interface usage patterns across the codebase.

Let's verify how these interfaces are implemented and used throughout the codebase.

✅ Verification successful

Interfaces are properly implemented and used in the codebase

The interfaces are well-integrated into the codebase with clear usage patterns:

  • Used in EnrichmentServiceCrustdata for type-safe data handling
  • Proper data transformation through normalize methods
  • Clear integration with the enrichment service workflow
  • Consistent usage in API response handling
  • Proper type imports across the codebase
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check implementation and usage patterns of the Crustdata interfaces

# Test 1: Find implementations and usages
echo "=== Searching for interface implementations and usages ==="
rg -t typescript "IMemberEnrichmentCrustdata|IMemberEnrichmentCrustdataEmployer|IMemberEnrichmentCrustdataRemainingCredits" -A 3

# Test 2: Check for potential type casting or validation
echo "=== Checking for type assertions and validations ==="
rg -t typescript "as IMemberEnrichment(Crustdata|CrustdataEmployer|CrustdataRemainingCredits)" -A 2

# Test 3: Look for error handling related to these types
echo "=== Checking error handling patterns ==="
rg -t typescript "try.*catch.*Crustdata" -A 5

Length of output: 718


Script:

#!/bin/bash
# Let's try without typescript file type flag since it's not recognized
# Search for interface implementations and usages
echo "=== Searching for interface implementations and usages ==="
rg "IMemberEnrichmentCrustdata|IMemberEnrichmentCrustdataEmployer|IMemberEnrichmentCrustdataRemainingCredits" -A 3

# Search for files containing these interfaces to understand the context
echo "=== Finding relevant files ==="
fd -e ts -e tsx . | xargs rg -l "IMemberEnrichmentCrustdata|IMemberEnrichmentCrustdataEmployer|IMemberEnrichmentCrustdataRemainingCredits"

# Look for any validation or transformation logic
echo "=== Checking for data transformation patterns ==="
rg "interface IMemberEnrichmentCrustdata" -A 5 -B 20

Length of output: 6581


Script:

#!/bin/bash
# Let's check how these interfaces are used in the service implementation
echo "=== Checking service implementation ==="
rg -A 20 "class EnrichmentServiceCrustdata" services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts

# Check for any transformations or mappings of the data
echo "=== Checking for data transformations ==="
rg "IMemberEnrichmentDataCrustdata" -B 5 -A 10 services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts

# Look for any imports of these types in other files
echo "=== Checking imports in other files ==="
rg "from.*crustdata/types"

Length of output: 5549

services/apps/premium/members_enrichment_worker/src/factory.ts (2)

6-6: LGTM! Import statement follows established patterns.

The import statement for the new Crustdata enrichment service follows the project's conventions for service organization and naming.


31-32: Verify configuration requirements for the Crustdata service.

While the basic implementation looks good, other enrichment services (like ProgAI) use additional configuration parameters such as ALSO_USE_EMAIL_IDENTITIES_FOR_ENRICHMENT. Please verify if the Crustdata service needs similar configuration options for consistency.

Let's verify the service implementation:

services/apps/premium/members_enrichment_worker/src/workflows/getMembersToEnrich.ts (2)

Line range hint 39-48: Review timeout settings for Crustdata API integration.

The child workflow configuration uses a 15-minute timeout with retry settings. Please verify if these settings are appropriate for the Crustdata API, considering:

  • Maximum response time of the Crustdata API
  • Rate limiting considerations
  • Typical processing time for LinkedIn data enrichment

Let's check for any configuration or documentation related to Crustdata API timing:

#!/bin/bash
# Description: Look for Crustdata API configuration and timing information

# Test 1: Search for Crustdata configuration files
echo "Checking for Crustdata configuration files..."
fd -t f -e yaml -e json -e env | xargs rg -l "crustdata"

# Test 2: Look for any timing/rate limit documentation
echo "Checking for timing/rate limit documentation..."
rg -i "rate.?limit|timeout|delay" -g "!node_modules" | rg -i "crustdata"

28-28: Verify child workflow compatibility with CRUSTDATA source.

The addition of MemberEnrichmentSource.CRUSTDATA looks good. However, we should ensure that the child workflow enrichMember properly handles this new source.

Let's verify the child workflow implementation:

✅ Verification successful

Child workflow properly handles CRUSTDATA source

The enrichMember workflow is fully compatible with the CRUSTDATA source. The implementation shows:

  1. The workflow iteratively processes all sources including CRUSTDATA through a generic enrichment flow
  2. The EnrichmentServiceCrustdata class properly implements the IEnrichmentService interface with:
    • Correct source configuration
    • LinkedIn-specific enrichment logic
    • Data normalization for member attributes
    • Proper cache handling
    • Rate limiting via credits check

The integration is complete and no issues were found.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if enrichMember workflow handles CRUSTDATA source

# Test 1: Look for CRUSTDATA handling in enrichMember workflow
echo "Checking enrichMember workflow for CRUSTDATA handling..."
ast-grep --pattern 'case MemberEnrichmentSource.CRUSTDATA:' || rg -A 5 'MemberEnrichmentSource.CRUSTDATA'

# Test 2: Check for any TODO/FIXME comments related to CRUSTDATA
echo "Checking for TODO/FIXME comments related to CRUSTDATA..."
rg -i "todo|fixme|xxx|hack" -A 3 -g "!node_modules" | rg -i "crustdata"

Length of output: 430


Script:

#!/bin/bash
# Let's try a different approach to find CRUSTDATA handling

# Check for enrichMember workflow file
echo "Looking for enrichMember workflow file..."
fd enrichMember

# Search for switch/case or if-else handling of MemberEnrichmentSource
echo "Checking source handling in workflow files..."
rg -l "MemberEnrichmentSource" --type ts

# Once we find the files, let's look at their content
echo "Checking implementation details..."
rg -A 10 "MemberEnrichmentSource" --type ts

# Look for any Crustdata related service or implementation
echo "Checking for Crustdata service implementation..."
rg -l -i "crustdata" --type ts

Length of output: 56637


Script:

#!/bin/bash
# Let's check the enrichMember workflow implementation specifically
echo "Checking enrichMember workflow implementation..."
cat services/apps/premium/members_enrichment_worker/src/workflows/enrichMember.ts

# Also check the Crustdata service implementation
echo "Checking Crustdata service implementation..."
cat services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts

Length of output: 15099

services/apps/premium/members_enrichment_worker/src/types.ts (4)

5-5: LGTM: New type imports are properly structured.

The addition of IMemberReach and IMemberEnrichmentDataCrustdata types aligns with the PR's objective of enhancing LinkedIn enrichment capabilities.

Also applies to: 14-14


35-36: LGTM: Union type extension follows existing patterns.

The addition of both single and array variants of IMemberEnrichmentDataCrustdata maintains consistency with existing types like IMemberEnrichmentDataProgAILinkedinScraper.


70-70: LGTM: Interface updates improve type safety.

The addition of the optional reach property and the stronger typing of metadata enhance type safety and data structure clarity.

Also applies to: 73-73


80-80: LGTM: Organization interface enhancement is well-structured.

The addition of the optional organizationDescription property enhances the organization data model appropriately.

services/libs/types/src/enums/members.ts (1)

39-39: Verify the integration of the SCHOOLS attribute.

The addition of the SCHOOLS attribute aligns with the LinkedIn enrichment feature. Let's verify its implementation across the codebase.

services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (2)

245-247: Ensure reach data is correctly normalized

The reach data is being assigned to normalized.reach[this.platform]. Verify that this.platform is correctly defined and does not cause any unintended side effects.

Run the following script to confirm this.platform usage:

#!/bin/bash
# Description: Verify the value of 'this.platform' throughout the code.

# Expected: Consistent usage of 'this.platform' without conflicts.

rg 'this\.platform' -- 'service.ts'

299-322: Handle potential null values in employment information

In the normalizeEmployment method, employmentInformation may contain undefined or null values if data.past_employers or data.current_employers are undefined or empty.

Please verify that data.past_employers and data.current_employers are always defined arrays. If not, consider adding checks to prevent runtime errors.

Run the following script to check for potential undefined values:

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Outside diff range and nitpick comments (7)
services/libs/types/src/enums/enrichment.ts (1)

6-6: Consider hyphenating the string value for consistency.

The enum member addition follows the correct naming convention. However, consider using 'crust-data' instead of 'crustdata' to maintain consistency with other hyphenated values like 'progai-linkedin-scraper'.

-  CRUSTDATA = 'crustdata',
+  CRUSTDATA = 'crust-data',
services/apps/premium/members_enrichment_worker/src/sources/crustdata/types.ts (1)

40-42: Consider adding constraints and usage metadata for credits.

The credits interface could be enhanced with:

  1. Type constraints to ensure positive numbers
  2. Additional metadata about credit usage and limits
export interface IMemberEnrichmentCrustdataRemainingCredits {
  /** Number of remaining API credits (must be >= 0) */
  credits: number
  /** Optional metadata about credit usage */
  last_updated?: Date
  credit_limit?: number
  reset_date?: Date
}
services/apps/premium/members_enrichment_worker/src/types.ts (1)

64-65: Consider a more flexible metadata type structure

The metadata type is now strictly tied to LinkedIn scraper metadata. Consider using a discriminated union type to support different metadata structures for various enrichment sources:

export type IMemberEnrichmentMetadataNormalized =
  | IMemberEnrichmentLinkedinScraperMetadata
  | IMemberEnrichmentCrustdataMetadata
  | IMemberEnrichmentGenericMetadata;
services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts (2)

109-125: Consider adding error handling for the API request.

The getDataUsingLinkedinHandle method looks good, but consider adding error handling for the API request to gracefully handle any potential failures or invalid responses.

Here's an example of how you can add error handling:

try {
  const response: IMemberEnrichmentDataProgAIResponse = (await axios(config)).data
  return response?.profile || null
} catch (error) {
  this.log.error(`Failed to fetch LinkedIn profile data for handle: ${handle}`, error)
  return null
}

90-93: Consider using optional chaining to simplify the condition.

The condition can be simplified using optional chaining to avoid the need for multiple checks.

Here's how you can refactor the condition using optional chaining:

- if (
-   existingProgaiCache &&
-   existingProgaiCache.data &&
-   (existingProgaiCache.data as IMemberEnrichmentDataProgAI).id == data.id
- ) {
+ if (existingProgaiCache?.data?.id === data.id) {
🧰 Tools
🪛 Biome

[error] 90-91: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (2)

96-96: Simplify conditional checks using optional chaining

You can simplify the conditional expression by using optional chaining for cleaner and more readable code.

Apply this diff to refactor the condition:

-    (input.linkedin && input.linkedin.value && input.linkedin.verified))
+    (input.linkedin?.value && input.linkedin?.verified))
🧰 Tools
🪛 Biome

[error] 96-96: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)


201-201: Simplify conditional checks using optional chaining

You can simplify the conditional expression by using optional chaining for cleaner and more readable code.

Apply this diff to refactor the condition:

-        if (input.linkedin && input.linkedin.value && input.linkedin.verified) {
+        if (input.linkedin?.value && input.linkedin?.verified) {
🧰 Tools
🪛 Biome

[error] 201-201: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1244e64 and d4d06e3.

📒 Files selected for processing (11)
  • services/apps/premium/members_enrichment_worker/src/factory.ts (2 hunks)
  • services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (1 hunks)
  • services/apps/premium/members_enrichment_worker/src/sources/crustdata/types.ts (1 hunks)
  • services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts (1 hunks)
  • services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/types.ts (1 hunks)
  • services/apps/premium/members_enrichment_worker/src/types.ts (4 hunks)
  • services/apps/premium/members_enrichment_worker/src/workflows/getMembersToEnrich.ts (1 hunks)
  • services/libs/common/src/strings.ts (1 hunks)
  • services/libs/types/src/enums/enrichment.ts (1 hunks)
  • services/libs/types/src/enums/members.ts (1 hunks)
  • services/libs/types/src/enums/organizations.ts (1 hunks)
🧰 Additional context used
🪛 Biome
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts

[error] 96-96: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)


[error] 201-201: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts

[error] 90-91: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

🔇 Additional comments (15)
services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/types.ts (1)

1-1: LGTM! Good refactoring of the metadata type.

The change to use IMemberEnrichmentLinkedinScraperMetadata improves code maintainability and reusability.

Let's verify that the new interface is compatible with existing usage:

Also applies to: 5-5

✅ Verification successful

Let me analyze the usage of metadata based on the search results:

  1. The interface IMemberEnrichmentLinkedinScraperMetadata is defined with two properties:

    • repeatedTimesInDifferentSources: number
    • isFromVerifiedSource: boolean
  2. The metadata usage appears in two services:

    • progai-linkedin-scraper/service.ts: { ...normalized, metadata: profile.metadata }
    • crustdata/service.ts: { ...profileNormalized, metadata: profile.metadata }

Let's verify the type compatibility by checking the profile types in both services:


Type compatibility verified - the refactoring is safe

The search results confirm that:

  1. Both services (progai-linkedin-scraper and crustdata) explicitly use IMemberEnrichmentLinkedinScraperMetadata as their metadata type
  2. The metadata object construction in both services matches the interface definition with exact properties:
    • repeatedTimesInDifferentSources
    • isFromVerifiedSource
  3. The spread operations (...normalized, metadata and ...profileNormalized, metadata) preserve the type safety
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if the new metadata type maintains backward compatibility
# by searching for existing usage of the metadata property

# Search for direct usage of the metadata property
rg -A 3 "\.metadata\." services/apps/premium/members_enrichment_worker/

# Search for object destructuring of metadata
rg -A 3 "metadata\s*}" services/apps/premium/members_enrichment_worker/

# Search for the interface definition to verify structure
rg -A 5 "interface IMemberEnrichmentLinkedinScraperMetadata" services/apps/premium/members_enrichment_worker/

Length of output: 1830


Script:

#!/bin/bash
# Check profile type definitions in both services
ast-grep --pattern 'interface $_Profile {
  $$$
  metadata: $_
  $$$
}'

# Also check any type definitions that might be using this metadata
rg -A 5 "metadata:" services/apps/premium/members_enrichment_worker/

Length of output: 3858

services/libs/types/src/enums/organizations.ts (1)

12-12: LGTM! The enum addition follows established patterns.

The new ENRICHMENT_CRUSTDATA value follows the naming and value formatting conventions of existing enrichment sources.

Let's verify the consistent usage of this new enum value:

✅ Verification successful

The new enum value is consistently used across the codebase

The verification shows proper integration of ENRICHMENT_CRUSTDATA:

  • Correctly used in members_enrichment_worker/src/sources/crustdata/service.ts for source attribution
  • Aligns with MemberEnrichmentSource.CRUSTDATA enum in enrichment.ts
  • Properly referenced in the enrichment service factory and workflows
  • No inconsistencies or typos found in the usage
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of the new ENRICHMENT_CRUSTDATA enum value
# Expected: Should find references in enrichment service implementation and factory

# Search for string literals matching the enum value
rg -l 'enrichment-crustdata'

# Search for the enum usage
rg 'OrganizationSource\.ENRICHMENT_CRUSTDATA'

# Search for potential typos or inconsistent usage
rg -i 'crustdata' -g '!*.{md,json,lock}'

Length of output: 4601

services/apps/premium/members_enrichment_worker/src/sources/crustdata/types.ts (2)

1-1: Verify the reuse of LinkedIn scraper metadata type.

The reuse of IMemberEnrichmentLinkedinScraperMetadata for Crustdata enrichment suggests potential coupling with LinkedIn-specific logic. Consider creating a more generic metadata interface if Crustdata has different metadata requirements.

#!/bin/bash
# Description: Check the structure of LinkedIn scraper metadata and its usage
# to determine if it's appropriate for Crustdata

# Find the definition of IMemberEnrichmentLinkedinScraperMetadata
ast-grep --pattern 'interface IMemberEnrichmentLinkedinScraperMetadata {
  $$$
}'

# Check for other uses of this interface
rg "IMemberEnrichmentLinkedinScraperMetadata" -A 5

3-14: 🛠️ Refactor suggestion

Consider splitting employer and employee information into separate interfaces.

The interface combines both employer and employee information. Consider refactoring into separate interfaces for better maintainability and separation of concerns.

interface IEmployerInfo {
  name: string
  linkedin_id: string
  linkedin_description?: string
  company_id: number[]
}

interface IEmployeePosition {
  position_id?: number
  title: string
  description?: string
  location: string
  start_date: string
  end_date: string
  employer: IEmployerInfo
}

Verify the array type for employer_company_id.

The employer_company_id field is defined as number[]. This seems unusual - should it be a single number instead?

Add date format constraints for start_date and end_date.

Consider using a more specific type for dates to ensure consistent formatting:

-  start_date: string
-  end_date: string
+  start_date: ISO8601Date // type ISO8601Date = string
+  end_date: ISO8601Date
✅ Verification successful

Let me gather more information about the usage of these interfaces and the company_id fields.


Let me check how these fields are used in the implementation.


Let me try a different approach to find the implementation details.


Let me check the implementation of the crustdata source to understand how these fields are used.


Based on the implementation details found, I can now provide the final response:

The array type for employer_company_id is correct and intentional

The employer_company_id and all_employers_company_id fields are correctly typed as number[]. This is part of the Crustdata enrichment service's data structure where:

  • Individual employers can have multiple company IDs (employer_company_id: number[])
  • The aggregated list of all company IDs is stored in all_employers_company_id: number[]

Interface separation is not necessary

While splitting the interface might seem cleaner, the current structure aligns with the Crustdata API response and its usage in the enrichment service. The fields are used together in the normalizeEmployment method where both employer and employee information is needed to create the normalized organization records.

Consider adding date format documentation

The only actionable suggestion is to add documentation about the expected date format:

interface IMemberEnrichmentCrustdataEmployer {
  // ... other fields ...
  /** ISO 8601 formatted date string */
  start_date: string
  /** ISO 8601 formatted date string */
  end_date: string
}
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for other usages of company_id arrays in the codebase
# to verify if this is consistent with the existing pattern

rg "company_id.*:.*\[\]" -g "*.ts"

Length of output: 4230


Script:

#!/bin/bash
# Check how employer_company_id and all_employers_company_id are used in the codebase
rg "employer_company_id|all_employers_company_id" -A 3 -B 3

# Check for any transformations or mappings of these fields
ast-grep --pattern 'employer_company_id = $_'

# Look for the implementation details in crustdata source
fd crustdata.ts --exec cat {}

Length of output: 50258


Script:

#!/bin/bash
# Look for the implementation details in crustdata source
fd crustdata --type f --exec cat {}

# Check for any transformations or mappings of these arrays
rg -A 5 -B 5 "employer_company_id.*=|all_employers_company_id.*="

Length of output: 107


Script:

#!/bin/bash
# Search for files containing crustdata implementation
fd -e ts -e js crustdata

# Look for any usage of these employer fields
rg "employer_company_id|all_employers_company_id" -C 10

# Check if there are any tests that might show the expected data structure
fd -e test.ts -e spec.ts crustdata

Length of output: 4087


Script:

#!/bin/bash
# Look for the implementation file
fd -e ts crustdata.source

# Check for any mapping or transformation of the response
rg -l "IMemberEnrichmentDataCrustdata" | xargs cat

# Look for any test data or fixtures
fd -g "*crustdata*.json" -g "*crustdata*.fixture*"

Length of output: 15483

services/apps/premium/members_enrichment_worker/src/factory.ts (2)

6-6: LGTM! Import statement follows established patterns.

The import statement for the new Crustdata enrichment service follows the consistent pattern and structure used for other enrichment services.


31-32: Verify the Crustdata service implementation.

While the integration follows the factory pattern, please verify:

  1. Whether the service needs additional configuration parameters like ALSO_USE_EMAIL_IDENTITIES_FOR_ENRICHMENT
  2. That EnrichmentServiceCrustdata properly implements IEnrichmentService

Let's verify the implementation:

✅ Verification successful

Implementation verified and appears correct

The Crustdata service implementation:

  • Properly implements IEnrichmentService interface with all required methods and properties
  • Does not require ALSO_USE_EMAIL_IDENTITIES_FOR_ENRICHMENT configuration as it's designed to work specifically with LinkedIn usernames (as seen in enrichableBySql)
  • Has appropriate configuration for its use case:
    • Works with members having >1000 activities
    • Only processes verified LinkedIn usernames
    • Looks up data in PROGAI, CLEARBIT, and SERP caches
    • Sets 90-day cache obsolescence
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if the CRUSTDATA enum exists
rg -A 1 "CRUSTDATA" services/libs/types/src/enums/enrichment.ts

# Verify the service implementation
ast-grep --pattern 'class EnrichmentServiceCrustdata implements IEnrichmentService {
  $$$
}'

# Check for configuration usage in other files
rg "ALSO_USE_EMAIL_IDENTITIES_FOR_ENRICHMENT.*Crustdata" services/apps/premium/members_enrichment_worker/src/

Length of output: 300


Script:

#!/bin/bash
# Let's first check if the enum exists in any other location
rg -l "enum.*MemberEnrichmentSource" 

# Check the implementation of EnrichmentServiceCrustdata
rg -A 20 "class EnrichmentServiceCrustdata"

# Check interface definition
rg -A 20 "interface IEnrichmentService"

# Look for any configuration related to Crustdata
rg -i "crustdata.*config"

Length of output: 5373

services/apps/premium/members_enrichment_worker/src/workflows/getMembersToEnrich.ts (2)

28-28: LGTM: Addition of Crustdata as enrichment source

The addition of MemberEnrichmentSource.CRUSTDATA to the sources array is straightforward and follows the existing pattern.


Line range hint 38-49: Verify timeout and retry settings for Crustdata API

Please verify that the current workflow execution settings are appropriate for the Crustdata API:

  • 15-minute workflow timeout
  • 10 maximum retry attempts
  • 2-second initial retry interval with exponential backoff
  • 30-second maximum retry interval

These settings might need adjustment based on:

  • Crustdata API's rate limits
  • Typical response times
  • Error recovery patterns
✅ Verification successful

Let me analyze the Crustdata API usage based on the provided code:

  1. The code shows that Crustdata is used for member enrichment through the EnrichmentServiceCrustdata class
  2. The API is called via axios with:
    • Base URL from CROWD_ENRICHMENT_CRUSTDATA_URL env var
    • Authentication using CROWD_ENRICHMENT_CRUSTDATA_API_KEY env var
  3. The service makes two types of API calls:
    • GET /user/credits to check remaining credits
    • GET /screener/person/enrich to enrich member data

Let me verify the API response times and rate limits by checking the Crustdata API documentation.


Let me try a different search pattern to find any error handling or rate limit related configurations for Crustdata.


Let me check the workflow configuration and error handling for Crustdata enrichment.


Based on the code analysis, I can now verify the timeout and retry settings for the Crustdata API:

The current workflow settings are appropriate

The settings are well-aligned with typical API behavior and error recovery patterns:

  1. 15-minute workflow timeout: This is appropriate as enrichment operations typically complete within minutes
  2. Retry configuration is suitable:
    • 10 maximum attempts provides good resilience
    • 2-second initial interval with exponential backoff (coefficient: 2) helps prevent overwhelming the API
    • 30-second maximum interval prevents too long delays between retries
    • The backoff strategy helps handle rate limits and temporary failures

The settings follow similar patterns used in other enrichment services in the codebase (e.g., PDL, Clearbit) and align with standard practices for handling external API calls.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any configuration or documentation related to Crustdata API settings
rg -g '!node_modules' -g '!dist' -A 5 'crustdata|CRUSTDATA' --ignore-case

# Look for any environment variables or configuration files that might contain API settings
fd -g '*.env*' -x cat {} \; 
fd -g '*config*.{ts,js,json}' -x cat {} \;

Length of output: 207312


Script:

#!/bin/bash
# Search for any Crustdata API documentation or rate limit configurations
rg -g '!node_modules' -g '!dist' -A 5 'CRUSTDATA.*RATE|CRUSTDATA.*LIMIT|CRUSTDATA.*TIMEOUT|CRUSTDATA.*DELAY' --ignore-case

# Look for any error handling or retry logic specific to Crustdata
rg -g '!node_modules' -g '!dist' -A 10 'crustdata.*error|crustdata.*retry|crustdata.*fail' --ignore-case

Length of output: 231


Script:

#!/bin/bash
# Search for any error handling in the Crustdata service implementation
rg -g '!node_modules' -g '!dist' -A 10 'class EnrichmentServiceCrustdata' 

# Look for any configurations or constants related to Crustdata
rg -g '!node_modules' -g '!dist' -A 5 'CRUSTDATA_' --ignore-case

# Search for any axios error handling or retry logic
rg -g '!node_modules' -g '!dist' -A 10 'axios.*catch.*crustdata' --ignore-case

Length of output: 3898


Script:

#!/bin/bash
# Search for workflow configuration for member enrichment
rg -g '!node_modules' -g '!dist' -A 10 'enrichMember.*workflow' --ignore-case

# Look for error handling in the Crustdata service
rg -g '!node_modules' -g '!dist' -A 10 'catch.*error.*crustdata' --ignore-case

# Search for any retry configurations
rg -g '!node_modules' -g '!dist' -A 10 'backoffCoefficient|maximumAttempts|initialInterval|maximumInterval'

Length of output: 57439

services/apps/premium/members_enrichment_worker/src/types.ts (4)

5-5: LGTM: Import additions are well-structured

The new imports for IMemberReach and IMemberEnrichmentDataCrustdata are properly organized and align with the PR's objective of adding Crustdata enrichment capabilities.

Also applies to: 14-14


70-70: LGTM: Type-safe additions to normalized data structure

The changes enhance type safety by:

  • Adding an optional reach property with proper typing
  • Replacing the generic metadata type with a more specific one

Also applies to: 73-73


80-80: LGTM: Organization description field properly added

The optional organizationDescription property is well-typed and follows naming conventions.


35-36: Verify the need for both single and array types

The union type includes both IMemberEnrichmentDataCrustdata and IMemberEnrichmentDataCrustdata[]. Please confirm if the Crustdata API can return both formats.

✅ Verification successful

Let's try broader patterns to find any usage of Crustdata in the codebase to understand how the API responses are handled.


Both single and array types are needed for Crustdata API integration

Looking at the implementation in services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts, the code confirms that both types are required:

  • The getData() method returns Promise<IMemberEnrichmentDataCrustdata[] | null> as it collects multiple profiles in an array
  • The getDataUsingLinkedinHandle() method returns a single IMemberEnrichmentDataCrustdata object
  • The API response is handled as an array (const response: IMemberEnrichmentDataCrustdata[] = (await axios(config)).data) but individual items are processed when normalizing the data

The union type correctly reflects this dual usage pattern in the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for Crustdata API response handling to verify both formats
ast-grep --pattern 'IMemberEnrichmentDataCrustdata[]' 
rg -A 5 "getData.*Crustdata.*return" 

Length of output: 382


Script:

#!/bin/bash
# Search for Crustdata API response handling with broader patterns
rg -A 10 "Crustdata" 

# Search for any usage of IMemberEnrichmentDataCrustdata type
ast-grep --pattern 'IMemberEnrichmentDataCrustdata'

# Search for any function that might handle Crustdata responses
rg -A 10 "crustdata" -i

Length of output: 43897

services/libs/types/src/enums/members.ts (1)

39-39: LGTM: New member attribute for schools data.

The addition of SCHOOLS attribute aligns well with the LinkedIn enrichment feature and complements the existing EDUCATION attribute.

services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts (2)

127-127: ****


75-107: Verify the new getData method is being called correctly.

The new getData method looks good and follows the expected logic to fetch and process LinkedIn profile data. However, ensure that all calls to this method have been updated to pass the IEnrichmentSourceInput parameter and handle the returned promise correctly.

Run the following script to verify the usage of the getData method:

✅ Verification successful

Based on the results, I can now generate the final response as we have sufficient information about the usage of getData method across the codebase.

The getData method is being called correctly throughout the codebase

The verification shows that:

  • The method is properly defined in the interface (IEnrichmentSource) in types.ts
  • All implementations in different services (clearbit, crustdata, serp, progai, progai-linkedin-scraper) correctly accept IEnrichmentSourceInput parameter and return appropriate promises
  • The method is called correctly in enrichment.ts activity, which handles the promise appropriately
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all calls to the `getData` method pass the correct parameter and handle the promise.

# Test: Search for the method usage. Expect: Calls with `IEnrichmentSourceInput` argument and promise handling.
rg -A 5 $'getData\('

Length of output: 5076

🧰 Tools
🪛 Biome

[error] 90-91: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (3)
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (3)

39-41: Consider documenting the activity threshold rationale

The threshold of 1000 activities for enrichment seems arbitrary. Consider adding a comment explaining the reasoning behind this number and whether it should be configurable via environment variables.


91-95: Improve null safety with optional chaining

Use optional chaining to make the code more robust against null/undefined values.

-      (hasEnrichableLinkedinInCache ||
-        (input.linkedin && input.linkedin.value && input.linkedin.verified))
+      (hasEnrichableLinkedinInCache || input.linkedin?.value && input.linkedin?.verified)
🧰 Tools
🪛 Biome

[error] 94-94: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)


166-169: Consider extracting identity types and improving type safety

The identity processing logic could benefit from better type organization:

  1. Extract the extended identity type to improve reusability
  2. Consider using a class to encapsulate the identity processing logic
+ interface IEnrichedIdentity extends IMemberIdentity {
+   repeatedTimesInDifferentSources: number
+   isFromVerifiedSource: boolean
+ }

+ class IdentityProcessor {
+   private linkedinUrlHashmap = new Map<string, number>()
+   private consumableIdentities: IEnrichedIdentity[] = []
+   
+   addIdentity(identity: IMemberIdentity, isVerified: boolean): void {
+     // ... implementation
+   }
+ }

Also applies to: 182-186

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between d4d06e3 and c6c7b9e.

📒 Files selected for processing (3)
  • services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (1 hunks)
  • services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts (1 hunks)
  • services/libs/types/src/enums/members.ts (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts
  • services/libs/types/src/enums/members.ts
🧰 Additional context used
🪛 Biome
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts

[error] 94-94: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)


[error] 199-199: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

🔇 Additional comments (1)
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (1)

155-157: ⚠️ Potential issue

Improve API response handling

The response handling assumes the API always returns an array. Consider adding type checking and error handling.

-    const response: IMemberEnrichmentDataCrustdata[] = (await axios(config)).data
-    return response.length > 0 ? response[0] : null
+    try {
+      const response = (await axios(config)).data
+      if (!Array.isArray(response)) {
+        this.log.error('Unexpected API response format', { response })
+        return null
+      }
+      return response.length > 0 ? response[0] : null
+    } catch (error) {
+      this.log.error('Error fetching data from Crustdata API', error)
+      return null
+    }

Likely invalid or redundant comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
services/apps/premium/members_enrichment_worker/src/main.ts (1)

18-19: Consider adding environment variable validation and documentation.

While the new Crustdata environment variables follow the established pattern, consider:

  1. Adding validation for these variables in the service initialization
  2. Documenting the expected format and values in the README or configuration docs

Example validation:

if (!process.env.CROWD_ENRICHMENT_CRUSTDATA_URL?.startsWith('https://')) {
  throw new Error('CROWD_ENRICHMENT_CRUSTDATA_URL must be a valid HTTPS URL');
}
if (!process.env.CROWD_ENRICHMENT_CRUSTDATA_API_KEY) {
  throw new Error('CROWD_ENRICHMENT_CRUSTDATA_API_KEY is required');
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c6c7b9e and 5c44fe2.

📒 Files selected for processing (1)
  • services/apps/premium/members_enrichment_worker/src/main.ts (3 hunks)
🔇 Additional comments (2)
services/apps/premium/members_enrichment_worker/src/main.ts (2)

1-7: LGTM! Import statements are properly formatted.


Line range hint 44-56: Verify Crustdata enrichment scheduling requirements.

Please confirm if the new Crustdata enrichment service:

  1. Fits within the existing scheduling pattern in scheduleMembersEnrichment()
  2. Has any special requirements for the LFX edition

Let's check the enrichment scheduling implementation:

✅ Verification successful

Let me gather more information about the enrichment implementation to understand how Crustdata fits in.


Let me check the specific implementation of the enrichment workflow and Crustdata integration:


No changes needed for Crustdata integration

The Crustdata enrichment service is already properly integrated into the existing enrichment framework:

  1. It uses the same scheduling pattern through members-enrichment-multiple-sources that runs daily at 6 AM
  2. It's designed to work alongside other enrichment sources (ProgAI, Clearbit, SERP)
  3. No special LFX edition handling is required as Crustdata enrichment is edition-agnostic
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check enrichment scheduling implementation
rg -A 10 "scheduleMembersEnrichment" services/apps/premium/members_enrichment_worker/src/

# Check for any LFX-specific handling
rg "Edition.LFX" services/apps/premium/members_enrichment_worker/src/

Length of output: 3688


Script:

#!/bin/bash
# Check the implementation of enrichment sources and workflows
rg -A 10 "members-enrichment-multiple-sources" services/apps/premium/members_enrichment_worker/src/

# Look for Crustdata related configurations and implementations
rg -l "crustdata|CRUSTDATA" services/apps/premium/members_enrichment_worker/src/

# Check workflow implementations
ast-grep --pattern 'class $_ implements Workflow {
  $$$
}'

Length of output: 1833


Script:

#!/bin/bash
# Check the workflow implementation
rg -A 20 "class.*getMembersToEnrich" services/apps/premium/members_enrichment_worker/src/workflows/

# Check how Crustdata is integrated into the enrichment process
cat services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts

# Check if there are any special configurations for Crustdata
rg "CROWD_ENRICHMENT.*CRUSTDATA" services/apps/premium/members_enrichment_worker/src/

Length of output: 11890

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (4)
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (4)

31-31: Consider documenting the activity threshold

The threshold of 1000 activities seems high. Consider adding a comment explaining the rationale behind this number or making it configurable through environment variables.


90-107: Enhance credits check implementation

Consider these improvements:

  1. The threshold of 5 credits should be documented or made configurable
  2. Add specific error handling for different types of API errors (e.g., network, authentication)
+ private readonly MIN_REQUIRED_CREDITS = 5;

  async hasRemainingCredits(): Promise<boolean> {
    try {
      const config = {
        method: 'get',
        url: `${process.env['CROWD_ENRICHMENT_CRUSTDATA_URL']}/user/credits`,
        headers: {
          Authorization: `Token ${process.env['CROWD_ENRICHMENT_CRUSTDATA_API_KEY']}`,
        },
      }

      const response: IMemberEnrichmentCrustdataRemainingCredits = (await axios(config)).data

-     return response.credits > 5
+     return response.credits > this.MIN_REQUIRED_CREDITS
    } catch (error) {
+     if (axios.isAxiosError(error)) {
+       if (error.response?.status === 401) {
+         this.log.error('Authentication failed with Crustdata API', error)
+       } else if (error.response?.status === 429) {
+         this.log.error('Rate limit exceeded for Crustdata API', error)
+       } else {
+         this.log.error('Error while checking Crustdata account usage', error)
+       }
+     }
      return false
    }
  }

158-161: Consider creating a dedicated type for enriched identities

The repeated use of this identity structure with additional properties suggests it could benefit from a dedicated interface.

+ interface IEnrichedMemberIdentity extends IMemberIdentity {
+   repeatedTimesInDifferentSources: number;
+   isFromVerifiedSource: boolean;
+ }

- const consumableIdentities: (IMemberIdentity & {
-   repeatedTimesInDifferentSources: number
-   isFromVerifiedSource: boolean
- })[] = []
+ const consumableIdentities: IEnrichedMemberIdentity[] = []

288-291: Add validation for employment data

Consider adding validation for the employment data to ensure all required fields are present before processing.

  private normalizeEmployment(
    data: IMemberEnrichmentDataCrustdata,
    normalized: IMemberEnrichmentDataNormalized,
  ): IMemberEnrichmentDataNormalized {
    const employmentInformation = (data.past_employers || []).concat(data.current_employers || [])
    if (employmentInformation.length > 0) {
      for (const workExperience of employmentInformation) {
+       if (!workExperience?.employer_name) {
+         this.log.warn('Skipping employment entry with missing employer name')
+         continue
+       }
        const identities = []
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 5c44fe2 and 773b750.

📒 Files selected for processing (1)
  • services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (1 hunks)
🧰 Additional context used
🪛 Biome
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts

[error] 86-86: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)


[error] 191-191: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

🪛 GitHub Check: lint-format-services
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts

[failure] 1-1:
Delete ;⏎⏎


[failure] 5-5:
Delete ;


[failure] 6-6:
Delete ;


[failure] 7-7:
Replace ·IMemberEnrichmentCache,·IMemberIdentity,·MemberAttributeName,·MemberEnrichmentSource,·MemberIdentityType,·OrganizationIdentityType,·OrganizationSource,·PlatformType·}·from·'@crowd/types';⏎⏎ with ⏎··IMemberEnrichmentCache,⏎··IMemberIdentity,⏎··MemberAttributeName,⏎··MemberEnrichmentSource,⏎··MemberIdentityType,⏎··OrganizationIdentityType,⏎··OrganizationSource,⏎··PlatformType,⏎}·from·'@crowd/types'


[failure] 11-11:
Delete ;

🔇 Additional comments (1)
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (1)

94-96: ⚠️ Potential issue

Add environment variable validation

Critical configuration values are accessed without validation. Consider adding a validation method to check required environment variables at startup.

+ private validateEnvVars() {
+   const required = [
+     'CROWD_ENRICHMENT_CRUSTDATA_URL',
+     'CROWD_ENRICHMENT_CRUSTDATA_API_KEY'
+   ];
+   const missing = required.filter(key => !process.env[key]);
+   if (missing.length > 0) {
+     throw new Error(`Missing required environment variables: ${missing.join(', ')}`);
+   }
+ }

  constructor(public readonly log: Logger) {
    super(log)
+   this.validateEnvVars()
  }

Likely invalid or redundant comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (5)
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (5)

39-41: Consider documenting the enrichment threshold rationale

The threshold of 1000 activities for enrichment eligibility seems arbitrary. Consider adding a comment explaining the rationale behind this number, or make it configurable through environment variables for easier adjustment.


73-96: Optimize cache iteration performance

The current implementation iterates through all caches even after finding a valid LinkedIn identity. Consider using Array.some() for better performance:

-    let hasEnrichableLinkedinInCache = false
-    for (const cache of caches) {
-      if (this.alsoFindInputsInSourceCaches.includes(cache.source)) {
-        const service = EnrichmentSourceServiceFactory.getEnrichmentSourceService(
-          cache.source,
-          this.log,
-        )
-        const normalized = service.normalize(cache.data) as IMemberEnrichmentDataNormalized
-        if (normalized.identities.some((i) => i.platform === PlatformType.LINKEDIN)) {
-          hasEnrichableLinkedinInCache = true
-          break
-        }
-      }
-    }
+    const hasEnrichableLinkedinInCache = caches.some(cache => {
+      if (!this.alsoFindInputsInSourceCaches.includes(cache.source)) return false;
+      const service = EnrichmentSourceServiceFactory.getEnrichmentSourceService(
+        cache.source,
+        this.log,
+      )
+      const normalized = service.normalize(cache.data) as IMemberEnrichmentDataNormalized
+      return normalized.identities.some((i) => i.platform === PlatformType.LINKEDIN)
+    })
🧰 Tools
🪛 Biome

[error] 94-94: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)


110-111: Make credit threshold configurable

The credit threshold of 5 is hard-coded. Consider making this configurable through environment variables and adding documentation about why 5 credits are required.

+  private readonly REQUIRED_CREDITS = Number(process.env['CROWD_ENRICHMENT_CRUSTDATA_REQUIRED_CREDITS'] ?? 5)
+
   async hasRemainingCredits(): Promise<boolean> {
     try {
       const response: IMemberEnrichmentCrustdataRemainingCredits = (await axios(config)).data
-      // realtime linkedin enrichment costs 5 credits
-      return response.credits > 5
+      return response.credits > this.REQUIRED_CREDITS

165-220: Consider breaking down the complex identity handling logic

The findDistinctScrapableLinkedinIdentities method is complex and handles multiple responsibilities. Consider breaking it down into smaller, focused methods:

  1. Finding identities from caches
  2. Processing verified identities
  3. Handling deduplication

This would improve readability and maintainability.

🧰 Tools
🪛 Biome

[error] 204-204: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)


91-95: Use optional chaining for safer property access

Consider using optional chaining to make the code more robust:

-      (hasEnrichableLinkedinInCache ||
-        (input.linkedin && input.linkedin.value && input.linkedin.verified))
+      (hasEnrichableLinkedinInCache || input.linkedin?.value && input.linkedin?.verified)

-    if (input.linkedin && input.linkedin.value && input.linkedin.verified) {
+    if (input.linkedin?.value && input.linkedin?.verified) {

Also applies to: 204-218

🧰 Tools
🪛 Biome

[error] 94-94: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 773b750 and 3f717a3.

📒 Files selected for processing (2)
  • services/apps/premium/members_enrichment_worker/src/main.ts (1 hunks)
  • services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • services/apps/premium/members_enrichment_worker/src/main.ts
🧰 Additional context used
🪛 Biome
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts

[error] 94-94: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)


[error] 204-204: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

🔇 Additional comments (1)
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (1)

222-252: Well-structured normalization implementation

The normalization logic is well-organized with clear separation of concerns between different types of data (identities, attributes, employment). The use of separate methods for each aspect makes the code maintainable and easy to understand.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (7)
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (5)

44-44: Consider documenting the rationale for the activity threshold.

The threshold of 1000 activities for member enrichment seems relatively high. Consider adding a comment explaining the reasoning behind this threshold to help future maintainers understand the decision.


96-100: Improve code robustness with optional chaining.

The conditional checks can be made more concise and safer using optional chaining.

-      input.activityCount > this.enrichMembersWithActivityMoreThan &&
-      (hasEnrichableLinkedinInCache ||
-        (input.linkedin && input.linkedin.value && input.linkedin.verified))
+      input.activityCount > this.enrichMembersWithActivityMoreThan &&
+      (hasEnrichableLinkedinInCache ||
+        input.linkedin?.value && input.linkedin?.verified)
🧰 Tools
🪛 Biome

[error] 99-99: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)


115-116: Extract magic number to a named constant.

The credit cost (5) should be extracted to a named constant at the class level for better maintainability.

+  private static readonly REALTIME_LINKEDIN_ENRICHMENT_CREDIT_COST = 5;
+
   async hasRemainingCredits(): Promise<boolean> {
     try {
       const response: IMemberEnrichmentCrustdataRemainingCredits = (await axios(config)).data
-      // realtime linkedin enrichment costs 5 credits
-      return response.credits > 5
+      return response.credits > EnrichmentServiceCrustdata.REALTIME_LINKEDIN_ENRICHMENT_CREDIT_COST

176-231: Consider breaking down the complex identity handling method.

The findDistinctScrapableLinkedinIdentities method is quite complex and handles multiple responsibilities. Consider extracting the cache processing and verified identity handling into separate methods for better maintainability.

+  private processCache(
+    cache: IMemberEnrichmentCache<IMemberEnrichmentData>,
+    linkedinUrlHashmap: Map<string, number>,
+    consumableIdentities: Array<IMemberIdentity & {
+      repeatedTimesInDifferentSources: number;
+      isFromVerifiedSource: boolean;
+    }>,
+  ): void {
+    const service = EnrichmentSourceServiceFactory.getEnrichmentSourceService(
+      cache.source,
+      this.log,
+    )
+    const normalized = service.normalize(cache.data) as IMemberEnrichmentDataNormalized
+    if (normalized.identities.some((i) => i.platform === PlatformType.LINKEDIN)) {
+      this.processLinkedInIdentity(
+        normalized.identities.find((i) => i.platform === PlatformType.LINKEDIN),
+        linkedinUrlHashmap,
+        consumableIdentities,
+      )
+    }
+  }
🧰 Tools
🪛 Biome

[error] 215-215: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)


281-290: Enhance email validation logic.

The current implementation splits the email string before validation. Consider validating the entire string first and then split to avoid unnecessary processing of invalid data.

-    if (data.email) {
-      for (const email of data.email.split(',').filter(isEmail)) {
+    if (data.email?.trim()) {
+      const emails = data.email
+        .split(',')
+        .map(e => e.trim())
+        .filter(e => e && isEmail(e))
+      for (const email of emails) {
         normalized.identities.push({
           type: MemberIdentityType.EMAIL,
           platform: this.platform,
-          value: email.trim(),
+          value: email,
           verified: false,
         })
       }
     }
services/apps/premium/members_enrichment_worker/src/sources/crustdata/types.ts (2)

12-13: Consider using Date types for date fields.

Using Date types instead of string for date fields like start_date, end_date, last_updated, and last_tried_linkedin_enrichment_date can improve type safety and make date manipulations easier.

Apply this diff to update the date fields:

 export interface IMemberEnrichmentCrustdataEmployer {
   employer_name: string
   employer_linkedin_id: string
   employer_linkedin_description?: string
   employer_company_id: number[]
   employee_position_id?: number
   employee_title: string
   employee_description?: string
   employee_location: string
-  start_date: string
-  end_date: string
+  start_date: Date
+  end_date: Date
 }

 export interface IMemberEnrichmentDataCrustdata {
   linkedin_profile_url: string
   linkedin_flagship_url: string
   name: string
   email: string
   title: string
-  last_updated: string
+  last_updated: Date
   headline: string
   summary: string
   num_of_connections: number
   skills: string
   profile_picture_url: string
   twitter_handle: string
   languages: string[]
   all_employers: string[]
   past_employers: IMemberEnrichmentCrustdataEmployer[]
   current_employers: IMemberEnrichmentCrustdataEmployer[]
   all_employers_company_id: number[]
   all_titles: string[]
   all_schools: string[]
   all_degrees: string[]
   metadata: IMemberEnrichmentLinkedinScraperMetadata
 }

 export interface IMemberEnrichmentCrustdataAPIErrorResponse {
   error: string
   linkedin_profile_url: string
-  last_tried_linkedin_enrichment_date: string
+  last_tried_linkedin_enrichment_date: Date
   did_last_linkedin_enrichment_succeed: boolean
 }

Also applies to: 22-22, 47-47


17-18: Add validation or specific types for URLs and email fields.

To enhance data integrity, consider using specific types or adding validation for URL and email fields:

  • linkedin_profile_url, linkedin_flagship_url, and profile_picture_url could use a URL type or include validation to ensure they are valid URLs.
  • email could use a specific Email type or include validation to ensure it is a valid email address.

Apply this diff to update the types:

 export interface IMemberEnrichmentDataCrustdata {
-  linkedin_profile_url: string
-  linkedin_flagship_url: string
   name: string
-  email: string
   title: string
   last_updated: string
   headline: string
   summary: string
   num_of_connections: number
   skills: string
-  profile_picture_url: string
   twitter_handle: string
   languages: string[]
   all_employers: string[]
   past_employers: IMemberEnrichmentCrustdataEmployer[]
   current_employers: IMemberEnrichmentCrustdataEmployer[]
   all_employers_company_id: number[]
   all_titles: string[]
   all_schools: string[]
   all_degrees: string[]
   metadata: IMemberEnrichmentLinkedinScraperMetadata
+  linkedin_profile_url: URL
+  linkedin_flagship_url: URL
+  email: Email
+  profile_picture_url: URL
 }

+// Define types for Email and URL if not already defined
+type Email = string;
+type URL = string;

Also applies to: 20-20, 27-27

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 3f717a3 and 033f52c.

📒 Files selected for processing (3)
  • services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (1 hunks)
  • services/apps/premium/members_enrichment_worker/src/sources/crustdata/types.ts (1 hunks)
  • services/apps/premium/members_enrichment_worker/src/workflows/getMembersToEnrich.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • services/apps/premium/members_enrichment_worker/src/workflows/getMembersToEnrich.ts
🧰 Additional context used
🪛 Biome
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts

[error] 99-99: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)


[error] 215-215: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

🔇 Additional comments (2)
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (1)

145-159: 🛠️ Refactor suggestion

Enhance API call configuration and URL safety.

Several improvements can be made to the API call configuration:

  1. URL encode the LinkedIn handle to handle special characters
  2. Add timeout configuration to prevent hanging requests
   const config = {
     method: 'get',
     url,
     params: {
-      linkedin_profile_url: `https://linkedin.com/in/${handle}`,
+      linkedin_profile_url: `https://linkedin.com/in/${encodeURIComponent(handle)}`,
       enrich_realtime: false,
     },
     headers: {
       Authorization: `Token ${process.env['CROWD_ENRICHMENT_CRUSTDATA_API_KEY']}`,
     },
+    timeout: 30000, // 30 seconds timeout
   }

Likely invalid or redundant comment.

services/apps/premium/members_enrichment_worker/src/sources/crustdata/types.ts (1)

26-26: Ensure consistency in the skills field data type.

The skills field is currently defined as a string, but considering that a member can have multiple skills, it might be more appropriate to define it as a string[] for consistency with similar fields like languages.

@epipav epipav merged commit 2568e4d into main Nov 7, 2024
7 checks passed
@epipav epipav deleted the feature/crustdata-linkedin-scraper branch November 7, 2024 12:25
@coderabbitai coderabbitai bot mentioned this pull request Nov 8, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Created by Linear-GitHub Sync
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants