-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crustdata linkedin enrichment #2678
Conversation
Warning Rate limit exceeded@epipav has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 7 minutes and 26 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughThe changes introduced in this pull request enhance the member enrichment functionality by adding a new enrichment service, Changes
Possibly related PRs
Suggested labels
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
…DotDev/crowd.dev into feature/crustdata-linkedin-scraper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Outside diff range and nitpick comments (8)
services/libs/common/src/strings.ts (2)
10-12
: Consider enhancing email validation robustness.While the current implementation covers basic email validation, consider these improvements for production use:
- Add input null/undefined checks
- Add maximum length validation (RFC 5321 limits to 254 chars)
- Use a more strict regex that validates:
- TLD length (2-63 chars)
- Valid special characters
- Maximum label length (63 chars)
export const isValidEmail = (email: string): boolean => { + if (!email) return false; + if (email.length > 254) return false; + const validEmail = email.trim(); - return /^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(email.trim()) + return /^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/.test(validEmail) }
10-12
: Add unit tests for email validation.Given that this function will be used for member enrichment, it's critical to have comprehensive test coverage. Please add unit tests covering:
- Valid email formats
- Invalid formats
- Edge cases (empty string, whitespace, special characters)
- Maximum length scenarios
- Different TLD formats
Would you like me to help generate a comprehensive test suite for this function?
services/apps/premium/members_enrichment_worker/src/sources/crustdata/types.ts (2)
3-14
: Consider improving type safety for date fields and ID structure.A few suggestions to enhance type safety and data integrity:
- The date fields should use a more specific type to ensure proper date format.
- The
employer_company_id
as an array of numbers is unusual for an ID field.Consider these improvements:
export interface IMemberEnrichmentCrustdataEmployer { employer_name: string employer_linkedin_id: string employer_linkedin_description?: string - employer_company_id: number[] + employer_company_id: number // or string if it's an external ID employee_position_id?: number employee_title: string employee_description?: string employee_location: string - start_date: string - end_date: string + start_date: Date | string // or use a specific date format type + end_date: Date | string // or use a specific date format type }
40-42
: Add constraints and documentation for credits field.The credits field should be non-negative and its purpose should be documented.
Consider adding JSDoc documentation and type constraints:
+/** + * Represents the remaining API credits for Crustdata enrichment service. + * Credits are consumed for each API request. + */ export interface IMemberEnrichmentCrustdataRemainingCredits { - credits: number + credits: number & { readonly brand: unique symbol } // Use branded type for non-negative }services/apps/premium/members_enrichment_worker/src/types.ts (2)
64-64
: Consider a more flexible metadata type structure.The current type alias
IMemberEnrichmentMetadataNormalized = IMemberEnrichmentLinkedinScraperMetadata
tightly couples normalized metadata to LinkedIn scraper metadata. Consider using a union type or a more generic interface to accommodate different metadata structures from various sources.export type IMemberEnrichmentMetadataNormalized = | IMemberEnrichmentLinkedinScraperMetadata | IMemberEnrichmentCrustdataMetadata | Record<string, unknown>;
86-89
: Add JSDoc documentation for the new metadata interface.While the properties are well-named, adding JSDoc comments would help clarify:
- The purpose and significance of each field
- The expected range of values for
repeatedTimesInDifferentSources
- The criteria for
isFromVerifiedSource
/** * Metadata for LinkedIn scraper enrichment results */ export interface IMemberEnrichmentLinkedinScraperMetadata { /** Number of times this member data appeared in different sources */ repeatedTimesInDifferentSources: number; /** Indicates if the data comes from a verified LinkedIn profile */ isFromVerifiedSource: boolean; }services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (2)
96-96
: Use optional chaining to simplify null checksThe condition
input.linkedin && input.linkedin.value && input.linkedin.verified
can be simplified using optional chaining for better readability.Apply this diff:
- (input.linkedin && input.linkedin.value && input.linkedin.verified)) + (input.linkedin?.value && input.linkedin.verified))🧰 Tools
🪛 Biome
[error] 96-96: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
201-201
: Use optional chaining to simplify null checksSimilarly, the condition
input.linkedin && input.linkedin.value && input.linkedin.verified
can be simplified using optional chaining.Apply this diff:
- if (input.linkedin && input.linkedin.value && input.linkedin.verified) { + if (input.linkedin?.value && input.linkedin.verified) {🧰 Tools
🪛 Biome
[error] 201-201: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (11)
services/apps/premium/members_enrichment_worker/src/factory.ts
(2 hunks)services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/sources/crustdata/types.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/types.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/types.ts
(4 hunks)services/apps/premium/members_enrichment_worker/src/workflows/getMembersToEnrich.ts
(1 hunks)services/libs/common/src/strings.ts
(1 hunks)services/libs/types/src/enums/enrichment.ts
(1 hunks)services/libs/types/src/enums/members.ts
(1 hunks)services/libs/types/src/enums/organizations.ts
(1 hunks)
🧰 Additional context used
🪛 Biome
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
[error] 96-96: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
[error] 201-201: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts
[error] 90-91: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
🔇 Additional comments (17)
services/libs/types/src/enums/enrichment.ts (1)
6-6
: LGTM! The enum addition follows the established patterns.
The new CRUSTDATA
member maintains consistency with existing naming conventions and integration patterns.
Let's verify the integration points mentioned in the summary:
✅ Verification successful
Integration points for CRUSTDATA enum are properly implemented
All expected integration points are confirmed:
- Service implementation exists in
sources/crustdata/service.ts
- Factory registration is present in
factory.ts
- Workflow integration found in
getMembersToEnrich.ts
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the integration of CRUSTDATA enum member
# Expected results:
# 1. EnrichmentServiceCrustdata class implementation
# 2. Factory registration
# 3. Usage in workflow
# Test 1: Verify service implementation
echo "Checking for EnrichmentServiceCrustdata implementation..."
rg -l "class EnrichmentServiceCrustdata" "services/apps/premium/members_enrichment_worker/src"
# Test 2: Verify factory registration
echo "Checking for factory registration..."
rg "case\s+MemberEnrichmentSource.CRUSTDATA" "services/apps/premium/members_enrichment_worker/src/factory.ts"
# Test 3: Verify workflow integration
echo "Checking for workflow integration..."
rg "MemberEnrichmentSource.CRUSTDATA" "services/apps/premium/members_enrichment_worker/src/workflows"
Length of output: 843
services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/types.ts (2)
1-1
: LGTM! Good practice moving the type to a common location.
The import of IMemberEnrichmentLinkedinScraperMetadata
from a shared types file promotes reusability and consistency across different enrichment sources.
5-5
: Verify the impact of metadata type change.
The change from an inline metadata type to IMemberEnrichmentLinkedinScraperMetadata
could affect existing code. Let's verify all usages are updated correctly.
✅ Verification successful
Let me try another search strategy to verify the impact of this type change.
Type change impact verification completed - no issues found
The type IMemberEnrichmentLinkedinScraperMetadata
is properly defined in types.ts
and all its usages are consistent:
- The type is defined in
services/apps/premium/members_enrichment_worker/src/types.ts
- It's correctly imported and used in both
progai-linkedin-scraper
andcrustdata
sources - The metadata field assignments in both
service.ts
files maintain type compatibility - No type casting issues or inconsistent usage patterns were found
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Search for usages of IMemberEnrichmentDataProgAILinkedinScraper
# and verify they handle the metadata field correctly
# Find direct usages of the interface
echo "Searching for direct interface usage..."
rg -l "IMemberEnrichmentDataProgAILinkedinScraper" --type typescript
# Find potential metadata field access patterns
echo -e "\nSearching for metadata field access patterns..."
rg "\.metadata\." --type typescript -C 3
# Find type casts or assignments
echo -e "\nSearching for type casts or assignments..."
ast-grep --pattern 'as IMemberEnrichmentDataProgAILinkedinScraper'
Length of output: 551
Script:
#!/bin/bash
# Search for files containing the interface name and metadata references
echo "Searching for interface and metadata usage..."
rg -p "IMemberEnrichmentLinkedinScraperMetadata|\.metadata"
# Search for type definition
echo -e "\nSearching for type definition..."
rg -p "interface.*IMemberEnrichmentLinkedinScraperMetadata" -A 5
# Look for any files in the specific directory
echo -e "\nListing relevant files in the directory..."
find services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/ -type f -name "*.ts"
Length of output: 2911
services/libs/types/src/enums/organizations.ts (1)
12-12
: LGTM! The enum addition follows conventions.
The new ENRICHMENT_CRUSTDATA
value follows the established naming patterns and is correctly placed among other enrichment sources.
Let's verify consistency with related enums:
✅ Verification successful
Verified: The enum addition is consistent with related enums
The verification confirms that CRUSTDATA
enum exists with the value 'crustdata'
in the MemberEnrichmentSource
enum, maintaining consistency across the codebase.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that the corresponding enum exists in enrichment.ts
# and follows the same naming convention
# Test: Search for the corresponding CRUSTDATA enum in enrichment.ts
# Expect: MemberEnrichmentSource.CRUSTDATA with value 'crustdata'
rg -l "enum.*MemberEnrichmentSource" | xargs rg "CRUSTDATA.*=.*'crustdata'"
Length of output: 110
services/apps/premium/members_enrichment_worker/src/sources/crustdata/types.ts (2)
1-1
: LGTM!
The import statement is correctly defined and the imported type is properly utilized in the IMemberEnrichmentDataCrustdata
interface.
1-42
: Verify interface usage patterns across the codebase.
Let's verify how these interfaces are implemented and used throughout the codebase.
✅ Verification successful
Interfaces are properly implemented and used in the codebase
The interfaces are well-integrated into the codebase with clear usage patterns:
- Used in
EnrichmentServiceCrustdata
for type-safe data handling - Proper data transformation through
normalize
methods - Clear integration with the enrichment service workflow
- Consistent usage in API response handling
- Proper type imports across the codebase
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check implementation and usage patterns of the Crustdata interfaces
# Test 1: Find implementations and usages
echo "=== Searching for interface implementations and usages ==="
rg -t typescript "IMemberEnrichmentCrustdata|IMemberEnrichmentCrustdataEmployer|IMemberEnrichmentCrustdataRemainingCredits" -A 3
# Test 2: Check for potential type casting or validation
echo "=== Checking for type assertions and validations ==="
rg -t typescript "as IMemberEnrichment(Crustdata|CrustdataEmployer|CrustdataRemainingCredits)" -A 2
# Test 3: Look for error handling related to these types
echo "=== Checking error handling patterns ==="
rg -t typescript "try.*catch.*Crustdata" -A 5
Length of output: 718
Script:
#!/bin/bash
# Let's try without typescript file type flag since it's not recognized
# Search for interface implementations and usages
echo "=== Searching for interface implementations and usages ==="
rg "IMemberEnrichmentCrustdata|IMemberEnrichmentCrustdataEmployer|IMemberEnrichmentCrustdataRemainingCredits" -A 3
# Search for files containing these interfaces to understand the context
echo "=== Finding relevant files ==="
fd -e ts -e tsx . | xargs rg -l "IMemberEnrichmentCrustdata|IMemberEnrichmentCrustdataEmployer|IMemberEnrichmentCrustdataRemainingCredits"
# Look for any validation or transformation logic
echo "=== Checking for data transformation patterns ==="
rg "interface IMemberEnrichmentCrustdata" -A 5 -B 20
Length of output: 6581
Script:
#!/bin/bash
# Let's check how these interfaces are used in the service implementation
echo "=== Checking service implementation ==="
rg -A 20 "class EnrichmentServiceCrustdata" services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
# Check for any transformations or mappings of the data
echo "=== Checking for data transformations ==="
rg "IMemberEnrichmentDataCrustdata" -B 5 -A 10 services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
# Look for any imports of these types in other files
echo "=== Checking imports in other files ==="
rg "from.*crustdata/types"
Length of output: 5549
services/apps/premium/members_enrichment_worker/src/factory.ts (2)
6-6
: LGTM! Import statement follows established patterns.
The import statement for the new Crustdata enrichment service follows the project's conventions for service organization and naming.
31-32
: Verify configuration requirements for the Crustdata service.
While the basic implementation looks good, other enrichment services (like ProgAI) use additional configuration parameters such as ALSO_USE_EMAIL_IDENTITIES_FOR_ENRICHMENT
. Please verify if the Crustdata service needs similar configuration options for consistency.
Let's verify the service implementation:
services/apps/premium/members_enrichment_worker/src/workflows/getMembersToEnrich.ts (2)
Line range hint 39-48
: Review timeout settings for Crustdata API integration.
The child workflow configuration uses a 15-minute timeout with retry settings. Please verify if these settings are appropriate for the Crustdata API, considering:
- Maximum response time of the Crustdata API
- Rate limiting considerations
- Typical processing time for LinkedIn data enrichment
Let's check for any configuration or documentation related to Crustdata API timing:
#!/bin/bash
# Description: Look for Crustdata API configuration and timing information
# Test 1: Search for Crustdata configuration files
echo "Checking for Crustdata configuration files..."
fd -t f -e yaml -e json -e env | xargs rg -l "crustdata"
# Test 2: Look for any timing/rate limit documentation
echo "Checking for timing/rate limit documentation..."
rg -i "rate.?limit|timeout|delay" -g "!node_modules" | rg -i "crustdata"
28-28
: Verify child workflow compatibility with CRUSTDATA source.
The addition of MemberEnrichmentSource.CRUSTDATA
looks good. However, we should ensure that the child workflow enrichMember
properly handles this new source.
Let's verify the child workflow implementation:
✅ Verification successful
Child workflow properly handles CRUSTDATA source
The enrichMember workflow is fully compatible with the CRUSTDATA source. The implementation shows:
- The workflow iteratively processes all sources including CRUSTDATA through a generic enrichment flow
- The EnrichmentServiceCrustdata class properly implements the IEnrichmentService interface with:
- Correct source configuration
- LinkedIn-specific enrichment logic
- Data normalization for member attributes
- Proper cache handling
- Rate limiting via credits check
The integration is complete and no issues were found.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if enrichMember workflow handles CRUSTDATA source
# Test 1: Look for CRUSTDATA handling in enrichMember workflow
echo "Checking enrichMember workflow for CRUSTDATA handling..."
ast-grep --pattern 'case MemberEnrichmentSource.CRUSTDATA:' || rg -A 5 'MemberEnrichmentSource.CRUSTDATA'
# Test 2: Check for any TODO/FIXME comments related to CRUSTDATA
echo "Checking for TODO/FIXME comments related to CRUSTDATA..."
rg -i "todo|fixme|xxx|hack" -A 3 -g "!node_modules" | rg -i "crustdata"
Length of output: 430
Script:
#!/bin/bash
# Let's try a different approach to find CRUSTDATA handling
# Check for enrichMember workflow file
echo "Looking for enrichMember workflow file..."
fd enrichMember
# Search for switch/case or if-else handling of MemberEnrichmentSource
echo "Checking source handling in workflow files..."
rg -l "MemberEnrichmentSource" --type ts
# Once we find the files, let's look at their content
echo "Checking implementation details..."
rg -A 10 "MemberEnrichmentSource" --type ts
# Look for any Crustdata related service or implementation
echo "Checking for Crustdata service implementation..."
rg -l -i "crustdata" --type ts
Length of output: 56637
Script:
#!/bin/bash
# Let's check the enrichMember workflow implementation specifically
echo "Checking enrichMember workflow implementation..."
cat services/apps/premium/members_enrichment_worker/src/workflows/enrichMember.ts
# Also check the Crustdata service implementation
echo "Checking Crustdata service implementation..."
cat services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
Length of output: 15099
services/apps/premium/members_enrichment_worker/src/types.ts (4)
5-5
: LGTM: New type imports are properly structured.
The addition of IMemberReach
and IMemberEnrichmentDataCrustdata
types aligns with the PR's objective of enhancing LinkedIn enrichment capabilities.
Also applies to: 14-14
35-36
: LGTM: Union type extension follows existing patterns.
The addition of both single and array variants of IMemberEnrichmentDataCrustdata
maintains consistency with existing types like IMemberEnrichmentDataProgAILinkedinScraper
.
70-70
: LGTM: Interface updates improve type safety.
The addition of the optional reach
property and the stronger typing of metadata
enhance type safety and data structure clarity.
Also applies to: 73-73
80-80
: LGTM: Organization interface enhancement is well-structured.
The addition of the optional organizationDescription
property enhances the organization data model appropriately.
services/libs/types/src/enums/members.ts (1)
39-39
: Verify the integration of the SCHOOLS attribute.
The addition of the SCHOOLS
attribute aligns with the LinkedIn enrichment feature. Let's verify its implementation across the codebase.
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (2)
245-247
: Ensure reach data is correctly normalized
The reach data is being assigned to normalized.reach[this.platform]
. Verify that this.platform
is correctly defined and does not cause any unintended side effects.
Run the following script to confirm this.platform
usage:
#!/bin/bash
# Description: Verify the value of 'this.platform' throughout the code.
# Expected: Consistent usage of 'this.platform' without conflicts.
rg 'this\.platform' -- 'service.ts'
299-322
: Handle potential null values in employment information
In the normalizeEmployment
method, employmentInformation
may contain undefined or null values if data.past_employers
or data.current_employers
are undefined or empty.
Please verify that data.past_employers
and data.current_employers
are always defined arrays. If not, consider adding checks to prevent runtime errors.
Run the following script to check for potential undefined values:
services/apps/premium/members_enrichment_worker/src/sources/crustdata/types.ts
Show resolved
Hide resolved
services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts
Outdated
Show resolved
Hide resolved
services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts
Outdated
Show resolved
Hide resolved
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
Outdated
Show resolved
Hide resolved
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
Outdated
Show resolved
Hide resolved
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Outside diff range and nitpick comments (7)
services/libs/types/src/enums/enrichment.ts (1)
6-6
: Consider hyphenating the string value for consistency.The enum member addition follows the correct naming convention. However, consider using 'crust-data' instead of 'crustdata' to maintain consistency with other hyphenated values like 'progai-linkedin-scraper'.
- CRUSTDATA = 'crustdata', + CRUSTDATA = 'crust-data',services/apps/premium/members_enrichment_worker/src/sources/crustdata/types.ts (1)
40-42
: Consider adding constraints and usage metadata for credits.The credits interface could be enhanced with:
- Type constraints to ensure positive numbers
- Additional metadata about credit usage and limits
export interface IMemberEnrichmentCrustdataRemainingCredits { /** Number of remaining API credits (must be >= 0) */ credits: number /** Optional metadata about credit usage */ last_updated?: Date credit_limit?: number reset_date?: Date }services/apps/premium/members_enrichment_worker/src/types.ts (1)
64-65
: Consider a more flexible metadata type structureThe metadata type is now strictly tied to LinkedIn scraper metadata. Consider using a discriminated union type to support different metadata structures for various enrichment sources:
export type IMemberEnrichmentMetadataNormalized = | IMemberEnrichmentLinkedinScraperMetadata | IMemberEnrichmentCrustdataMetadata | IMemberEnrichmentGenericMetadata;services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts (2)
109-125
: Consider adding error handling for the API request.The
getDataUsingLinkedinHandle
method looks good, but consider adding error handling for the API request to gracefully handle any potential failures or invalid responses.Here's an example of how you can add error handling:
try { const response: IMemberEnrichmentDataProgAIResponse = (await axios(config)).data return response?.profile || null } catch (error) { this.log.error(`Failed to fetch LinkedIn profile data for handle: ${handle}`, error) return null }
90-93
: Consider using optional chaining to simplify the condition.The condition can be simplified using optional chaining to avoid the need for multiple checks.
Here's how you can refactor the condition using optional chaining:
- if ( - existingProgaiCache && - existingProgaiCache.data && - (existingProgaiCache.data as IMemberEnrichmentDataProgAI).id == data.id - ) { + if (existingProgaiCache?.data?.id === data.id) {🧰 Tools
🪛 Biome
[error] 90-91: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (2)
96-96
: Simplify conditional checks using optional chainingYou can simplify the conditional expression by using optional chaining for cleaner and more readable code.
Apply this diff to refactor the condition:
- (input.linkedin && input.linkedin.value && input.linkedin.verified)) + (input.linkedin?.value && input.linkedin?.verified))🧰 Tools
🪛 Biome
[error] 96-96: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
201-201
: Simplify conditional checks using optional chainingYou can simplify the conditional expression by using optional chaining for cleaner and more readable code.
Apply this diff to refactor the condition:
- if (input.linkedin && input.linkedin.value && input.linkedin.verified) { + if (input.linkedin?.value && input.linkedin?.verified) {🧰 Tools
🪛 Biome
[error] 201-201: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (11)
services/apps/premium/members_enrichment_worker/src/factory.ts
(2 hunks)services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/sources/crustdata/types.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/types.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/types.ts
(4 hunks)services/apps/premium/members_enrichment_worker/src/workflows/getMembersToEnrich.ts
(1 hunks)services/libs/common/src/strings.ts
(1 hunks)services/libs/types/src/enums/enrichment.ts
(1 hunks)services/libs/types/src/enums/members.ts
(1 hunks)services/libs/types/src/enums/organizations.ts
(1 hunks)
🧰 Additional context used
🪛 Biome
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
[error] 96-96: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
[error] 201-201: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts
[error] 90-91: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
🔇 Additional comments (15)
services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/types.ts (1)
1-1
: LGTM! Good refactoring of the metadata type.
The change to use IMemberEnrichmentLinkedinScraperMetadata
improves code maintainability and reusability.
Let's verify that the new interface is compatible with existing usage:
Also applies to: 5-5
✅ Verification successful
Let me analyze the usage of metadata based on the search results:
-
The interface
IMemberEnrichmentLinkedinScraperMetadata
is defined with two properties:repeatedTimesInDifferentSources: number
isFromVerifiedSource: boolean
-
The metadata usage appears in two services:
progai-linkedin-scraper/service.ts
:{ ...normalized, metadata: profile.metadata }
crustdata/service.ts
:{ ...profileNormalized, metadata: profile.metadata }
Let's verify the type compatibility by checking the profile types in both services:
Type compatibility verified - the refactoring is safe
The search results confirm that:
- Both services (progai-linkedin-scraper and crustdata) explicitly use
IMemberEnrichmentLinkedinScraperMetadata
as their metadata type - The metadata object construction in both services matches the interface definition with exact properties:
repeatedTimesInDifferentSources
isFromVerifiedSource
- The spread operations (
...normalized, metadata
and...profileNormalized, metadata
) preserve the type safety
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if the new metadata type maintains backward compatibility
# by searching for existing usage of the metadata property
# Search for direct usage of the metadata property
rg -A 3 "\.metadata\." services/apps/premium/members_enrichment_worker/
# Search for object destructuring of metadata
rg -A 3 "metadata\s*}" services/apps/premium/members_enrichment_worker/
# Search for the interface definition to verify structure
rg -A 5 "interface IMemberEnrichmentLinkedinScraperMetadata" services/apps/premium/members_enrichment_worker/
Length of output: 1830
Script:
#!/bin/bash
# Check profile type definitions in both services
ast-grep --pattern 'interface $_Profile {
$$$
metadata: $_
$$$
}'
# Also check any type definitions that might be using this metadata
rg -A 5 "metadata:" services/apps/premium/members_enrichment_worker/
Length of output: 3858
services/libs/types/src/enums/organizations.ts (1)
12-12
: LGTM! The enum addition follows established patterns.
The new ENRICHMENT_CRUSTDATA
value follows the naming and value formatting conventions of existing enrichment sources.
Let's verify the consistent usage of this new enum value:
✅ Verification successful
The new enum value is consistently used across the codebase
The verification shows proper integration of ENRICHMENT_CRUSTDATA
:
- Correctly used in
members_enrichment_worker/src/sources/crustdata/service.ts
for source attribution - Aligns with
MemberEnrichmentSource.CRUSTDATA
enum inenrichment.ts
- Properly referenced in the enrichment service factory and workflows
- No inconsistencies or typos found in the usage
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the usage of the new ENRICHMENT_CRUSTDATA enum value
# Expected: Should find references in enrichment service implementation and factory
# Search for string literals matching the enum value
rg -l 'enrichment-crustdata'
# Search for the enum usage
rg 'OrganizationSource\.ENRICHMENT_CRUSTDATA'
# Search for potential typos or inconsistent usage
rg -i 'crustdata' -g '!*.{md,json,lock}'
Length of output: 4601
services/apps/premium/members_enrichment_worker/src/sources/crustdata/types.ts (2)
1-1
: Verify the reuse of LinkedIn scraper metadata type.
The reuse of IMemberEnrichmentLinkedinScraperMetadata
for Crustdata enrichment suggests potential coupling with LinkedIn-specific logic. Consider creating a more generic metadata interface if Crustdata has different metadata requirements.
#!/bin/bash
# Description: Check the structure of LinkedIn scraper metadata and its usage
# to determine if it's appropriate for Crustdata
# Find the definition of IMemberEnrichmentLinkedinScraperMetadata
ast-grep --pattern 'interface IMemberEnrichmentLinkedinScraperMetadata {
$$$
}'
# Check for other uses of this interface
rg "IMemberEnrichmentLinkedinScraperMetadata" -A 5
3-14
: 🛠️ Refactor suggestion
Consider splitting employer and employee information into separate interfaces.
The interface combines both employer and employee information. Consider refactoring into separate interfaces for better maintainability and separation of concerns.
interface IEmployerInfo {
name: string
linkedin_id: string
linkedin_description?: string
company_id: number[]
}
interface IEmployeePosition {
position_id?: number
title: string
description?: string
location: string
start_date: string
end_date: string
employer: IEmployerInfo
}
Verify the array type for employer_company_id.
The employer_company_id
field is defined as number[]
. This seems unusual - should it be a single number instead?
Add date format constraints for start_date and end_date.
Consider using a more specific type for dates to ensure consistent formatting:
- start_date: string
- end_date: string
+ start_date: ISO8601Date // type ISO8601Date = string
+ end_date: ISO8601Date
✅ Verification successful
Let me gather more information about the usage of these interfaces and the company_id fields.
Let me check how these fields are used in the implementation.
Let me try a different approach to find the implementation details.
Let me check the implementation of the crustdata source to understand how these fields are used.
Based on the implementation details found, I can now provide the final response:
The array type for employer_company_id is correct and intentional
The employer_company_id
and all_employers_company_id
fields are correctly typed as number[]
. This is part of the Crustdata enrichment service's data structure where:
- Individual employers can have multiple company IDs (
employer_company_id: number[]
) - The aggregated list of all company IDs is stored in
all_employers_company_id: number[]
Interface separation is not necessary
While splitting the interface might seem cleaner, the current structure aligns with the Crustdata API response and its usage in the enrichment service. The fields are used together in the normalizeEmployment
method where both employer and employee information is needed to create the normalized organization records.
Consider adding date format documentation
The only actionable suggestion is to add documentation about the expected date format:
interface IMemberEnrichmentCrustdataEmployer {
// ... other fields ...
/** ISO 8601 formatted date string */
start_date: string
/** ISO 8601 formatted date string */
end_date: string
}
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for other usages of company_id arrays in the codebase
# to verify if this is consistent with the existing pattern
rg "company_id.*:.*\[\]" -g "*.ts"
Length of output: 4230
Script:
#!/bin/bash
# Check how employer_company_id and all_employers_company_id are used in the codebase
rg "employer_company_id|all_employers_company_id" -A 3 -B 3
# Check for any transformations or mappings of these fields
ast-grep --pattern 'employer_company_id = $_'
# Look for the implementation details in crustdata source
fd crustdata.ts --exec cat {}
Length of output: 50258
Script:
#!/bin/bash
# Look for the implementation details in crustdata source
fd crustdata --type f --exec cat {}
# Check for any transformations or mappings of these arrays
rg -A 5 -B 5 "employer_company_id.*=|all_employers_company_id.*="
Length of output: 107
Script:
#!/bin/bash
# Search for files containing crustdata implementation
fd -e ts -e js crustdata
# Look for any usage of these employer fields
rg "employer_company_id|all_employers_company_id" -C 10
# Check if there are any tests that might show the expected data structure
fd -e test.ts -e spec.ts crustdata
Length of output: 4087
Script:
#!/bin/bash
# Look for the implementation file
fd -e ts crustdata.source
# Check for any mapping or transformation of the response
rg -l "IMemberEnrichmentDataCrustdata" | xargs cat
# Look for any test data or fixtures
fd -g "*crustdata*.json" -g "*crustdata*.fixture*"
Length of output: 15483
services/apps/premium/members_enrichment_worker/src/factory.ts (2)
6-6
: LGTM! Import statement follows established patterns.
The import statement for the new Crustdata enrichment service follows the consistent pattern and structure used for other enrichment services.
31-32
: Verify the Crustdata service implementation.
While the integration follows the factory pattern, please verify:
- Whether the service needs additional configuration parameters like
ALSO_USE_EMAIL_IDENTITIES_FOR_ENRICHMENT
- That
EnrichmentServiceCrustdata
properly implementsIEnrichmentService
Let's verify the implementation:
✅ Verification successful
Implementation verified and appears correct
The Crustdata service implementation:
- Properly implements
IEnrichmentService
interface with all required methods and properties - Does not require
ALSO_USE_EMAIL_IDENTITIES_FOR_ENRICHMENT
configuration as it's designed to work specifically with LinkedIn usernames (as seen inenrichableBySql
) - Has appropriate configuration for its use case:
- Works with members having >1000 activities
- Only processes verified LinkedIn usernames
- Looks up data in PROGAI, CLEARBIT, and SERP caches
- Sets 90-day cache obsolescence
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if the CRUSTDATA enum exists
rg -A 1 "CRUSTDATA" services/libs/types/src/enums/enrichment.ts
# Verify the service implementation
ast-grep --pattern 'class EnrichmentServiceCrustdata implements IEnrichmentService {
$$$
}'
# Check for configuration usage in other files
rg "ALSO_USE_EMAIL_IDENTITIES_FOR_ENRICHMENT.*Crustdata" services/apps/premium/members_enrichment_worker/src/
Length of output: 300
Script:
#!/bin/bash
# Let's first check if the enum exists in any other location
rg -l "enum.*MemberEnrichmentSource"
# Check the implementation of EnrichmentServiceCrustdata
rg -A 20 "class EnrichmentServiceCrustdata"
# Check interface definition
rg -A 20 "interface IEnrichmentService"
# Look for any configuration related to Crustdata
rg -i "crustdata.*config"
Length of output: 5373
services/apps/premium/members_enrichment_worker/src/workflows/getMembersToEnrich.ts (2)
28-28
: LGTM: Addition of Crustdata as enrichment source
The addition of MemberEnrichmentSource.CRUSTDATA
to the sources array is straightforward and follows the existing pattern.
Line range hint 38-49
: Verify timeout and retry settings for Crustdata API
Please verify that the current workflow execution settings are appropriate for the Crustdata API:
- 15-minute workflow timeout
- 10 maximum retry attempts
- 2-second initial retry interval with exponential backoff
- 30-second maximum retry interval
These settings might need adjustment based on:
- Crustdata API's rate limits
- Typical response times
- Error recovery patterns
✅ Verification successful
Let me analyze the Crustdata API usage based on the provided code:
- The code shows that Crustdata is used for member enrichment through the
EnrichmentServiceCrustdata
class - The API is called via axios with:
- Base URL from
CROWD_ENRICHMENT_CRUSTDATA_URL
env var - Authentication using
CROWD_ENRICHMENT_CRUSTDATA_API_KEY
env var
- Base URL from
- The service makes two types of API calls:
- GET
/user/credits
to check remaining credits - GET
/screener/person/enrich
to enrich member data
- GET
Let me verify the API response times and rate limits by checking the Crustdata API documentation.
Let me try a different search pattern to find any error handling or rate limit related configurations for Crustdata.
Let me check the workflow configuration and error handling for Crustdata enrichment.
Based on the code analysis, I can now verify the timeout and retry settings for the Crustdata API:
The current workflow settings are appropriate
The settings are well-aligned with typical API behavior and error recovery patterns:
- 15-minute workflow timeout: This is appropriate as enrichment operations typically complete within minutes
- Retry configuration is suitable:
- 10 maximum attempts provides good resilience
- 2-second initial interval with exponential backoff (coefficient: 2) helps prevent overwhelming the API
- 30-second maximum interval prevents too long delays between retries
- The backoff strategy helps handle rate limits and temporary failures
The settings follow similar patterns used in other enrichment services in the codebase (e.g., PDL, Clearbit) and align with standard practices for handling external API calls.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any configuration or documentation related to Crustdata API settings
rg -g '!node_modules' -g '!dist' -A 5 'crustdata|CRUSTDATA' --ignore-case
# Look for any environment variables or configuration files that might contain API settings
fd -g '*.env*' -x cat {} \;
fd -g '*config*.{ts,js,json}' -x cat {} \;
Length of output: 207312
Script:
#!/bin/bash
# Search for any Crustdata API documentation or rate limit configurations
rg -g '!node_modules' -g '!dist' -A 5 'CRUSTDATA.*RATE|CRUSTDATA.*LIMIT|CRUSTDATA.*TIMEOUT|CRUSTDATA.*DELAY' --ignore-case
# Look for any error handling or retry logic specific to Crustdata
rg -g '!node_modules' -g '!dist' -A 10 'crustdata.*error|crustdata.*retry|crustdata.*fail' --ignore-case
Length of output: 231
Script:
#!/bin/bash
# Search for any error handling in the Crustdata service implementation
rg -g '!node_modules' -g '!dist' -A 10 'class EnrichmentServiceCrustdata'
# Look for any configurations or constants related to Crustdata
rg -g '!node_modules' -g '!dist' -A 5 'CRUSTDATA_' --ignore-case
# Search for any axios error handling or retry logic
rg -g '!node_modules' -g '!dist' -A 10 'axios.*catch.*crustdata' --ignore-case
Length of output: 3898
Script:
#!/bin/bash
# Search for workflow configuration for member enrichment
rg -g '!node_modules' -g '!dist' -A 10 'enrichMember.*workflow' --ignore-case
# Look for error handling in the Crustdata service
rg -g '!node_modules' -g '!dist' -A 10 'catch.*error.*crustdata' --ignore-case
# Search for any retry configurations
rg -g '!node_modules' -g '!dist' -A 10 'backoffCoefficient|maximumAttempts|initialInterval|maximumInterval'
Length of output: 57439
services/apps/premium/members_enrichment_worker/src/types.ts (4)
5-5
: LGTM: Import additions are well-structured
The new imports for IMemberReach
and IMemberEnrichmentDataCrustdata
are properly organized and align with the PR's objective of adding Crustdata enrichment capabilities.
Also applies to: 14-14
70-70
: LGTM: Type-safe additions to normalized data structure
The changes enhance type safety by:
- Adding an optional
reach
property with proper typing - Replacing the generic metadata type with a more specific one
Also applies to: 73-73
80-80
: LGTM: Organization description field properly added
The optional organizationDescription
property is well-typed and follows naming conventions.
35-36
: Verify the need for both single and array types
The union type includes both IMemberEnrichmentDataCrustdata
and IMemberEnrichmentDataCrustdata[]
. Please confirm if the Crustdata API can return both formats.
✅ Verification successful
Let's try broader patterns to find any usage of Crustdata in the codebase to understand how the API responses are handled.
Both single and array types are needed for Crustdata API integration
Looking at the implementation in services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
, the code confirms that both types are required:
- The
getData()
method returnsPromise<IMemberEnrichmentDataCrustdata[] | null>
as it collects multiple profiles in an array - The
getDataUsingLinkedinHandle()
method returns a singleIMemberEnrichmentDataCrustdata
object - The API response is handled as an array (
const response: IMemberEnrichmentDataCrustdata[] = (await axios(config)).data
) but individual items are processed when normalizing the data
The union type correctly reflects this dual usage pattern in the codebase.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for Crustdata API response handling to verify both formats
ast-grep --pattern 'IMemberEnrichmentDataCrustdata[]'
rg -A 5 "getData.*Crustdata.*return"
Length of output: 382
Script:
#!/bin/bash
# Search for Crustdata API response handling with broader patterns
rg -A 10 "Crustdata"
# Search for any usage of IMemberEnrichmentDataCrustdata type
ast-grep --pattern 'IMemberEnrichmentDataCrustdata'
# Search for any function that might handle Crustdata responses
rg -A 10 "crustdata" -i
Length of output: 43897
services/libs/types/src/enums/members.ts (1)
39-39
: LGTM: New member attribute for schools data.
The addition of SCHOOLS
attribute aligns well with the LinkedIn enrichment feature and complements the existing EDUCATION
attribute.
services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts (2)
127-127
: ****
75-107
: Verify the new getData
method is being called correctly.
The new getData
method looks good and follows the expected logic to fetch and process LinkedIn profile data. However, ensure that all calls to this method have been updated to pass the IEnrichmentSourceInput
parameter and handle the returned promise correctly.
Run the following script to verify the usage of the getData
method:
✅ Verification successful
Based on the results, I can now generate the final response as we have sufficient information about the usage of getData
method across the codebase.
The getData
method is being called correctly throughout the codebase
The verification shows that:
- The method is properly defined in the interface (
IEnrichmentSource
) intypes.ts
- All implementations in different services (
clearbit
,crustdata
,serp
,progai
,progai-linkedin-scraper
) correctly acceptIEnrichmentSourceInput
parameter and return appropriate promises - The method is called correctly in
enrichment.ts
activity, which handles the promise appropriately
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify all calls to the `getData` method pass the correct parameter and handle the promise.
# Test: Search for the method usage. Expect: Calls with `IEnrichmentSourceInput` argument and promise handling.
rg -A 5 $'getData\('
Length of output: 5076
🧰 Tools
🪛 Biome
[error] 90-91: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
services/apps/premium/members_enrichment_worker/src/sources/crustdata/types.ts
Show resolved
Hide resolved
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
Outdated
Show resolved
Hide resolved
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
Show resolved
Hide resolved
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (3)
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (3)
39-41
: Consider documenting the activity threshold rationaleThe threshold of 1000 activities for enrichment seems arbitrary. Consider adding a comment explaining the reasoning behind this number and whether it should be configurable via environment variables.
91-95
: Improve null safety with optional chainingUse optional chaining to make the code more robust against null/undefined values.
- (hasEnrichableLinkedinInCache || - (input.linkedin && input.linkedin.value && input.linkedin.verified)) + (hasEnrichableLinkedinInCache || input.linkedin?.value && input.linkedin?.verified)🧰 Tools
🪛 Biome
[error] 94-94: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
166-169
: Consider extracting identity types and improving type safetyThe identity processing logic could benefit from better type organization:
- Extract the extended identity type to improve reusability
- Consider using a class to encapsulate the identity processing logic
+ interface IEnrichedIdentity extends IMemberIdentity { + repeatedTimesInDifferentSources: number + isFromVerifiedSource: boolean + } + class IdentityProcessor { + private linkedinUrlHashmap = new Map<string, number>() + private consumableIdentities: IEnrichedIdentity[] = [] + + addIdentity(identity: IMemberIdentity, isVerified: boolean): void { + // ... implementation + } + }Also applies to: 182-186
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts
(1 hunks)services/libs/types/src/enums/members.ts
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- services/apps/premium/members_enrichment_worker/src/sources/progai-linkedin-scraper/service.ts
- services/libs/types/src/enums/members.ts
🧰 Additional context used
🪛 Biome
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
[error] 94-94: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
[error] 199-199: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
🔇 Additional comments (1)
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (1)
155-157
:
Improve API response handling
The response handling assumes the API always returns an array. Consider adding type checking and error handling.
- const response: IMemberEnrichmentDataCrustdata[] = (await axios(config)).data
- return response.length > 0 ? response[0] : null
+ try {
+ const response = (await axios(config)).data
+ if (!Array.isArray(response)) {
+ this.log.error('Unexpected API response format', { response })
+ return null
+ }
+ return response.length > 0 ? response[0] : null
+ } catch (error) {
+ this.log.error('Error fetching data from Crustdata API', error)
+ return null
+ }
Likely invalid or redundant comment.
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
Outdated
Show resolved
Hide resolved
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
services/apps/premium/members_enrichment_worker/src/main.ts (1)
18-19
: Consider adding environment variable validation and documentation.While the new Crustdata environment variables follow the established pattern, consider:
- Adding validation for these variables in the service initialization
- Documenting the expected format and values in the README or configuration docs
Example validation:
if (!process.env.CROWD_ENRICHMENT_CRUSTDATA_URL?.startsWith('https://')) { throw new Error('CROWD_ENRICHMENT_CRUSTDATA_URL must be a valid HTTPS URL'); } if (!process.env.CROWD_ENRICHMENT_CRUSTDATA_API_KEY) { throw new Error('CROWD_ENRICHMENT_CRUSTDATA_API_KEY is required'); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
services/apps/premium/members_enrichment_worker/src/main.ts
(3 hunks)
🔇 Additional comments (2)
services/apps/premium/members_enrichment_worker/src/main.ts (2)
1-7
: LGTM! Import statements are properly formatted.
Line range hint 44-56
: Verify Crustdata enrichment scheduling requirements.
Please confirm if the new Crustdata enrichment service:
- Fits within the existing scheduling pattern in
scheduleMembersEnrichment()
- Has any special requirements for the LFX edition
Let's check the enrichment scheduling implementation:
✅ Verification successful
Let me gather more information about the enrichment implementation to understand how Crustdata fits in.
Let me check the specific implementation of the enrichment workflow and Crustdata integration:
No changes needed for Crustdata integration
The Crustdata enrichment service is already properly integrated into the existing enrichment framework:
- It uses the same scheduling pattern through
members-enrichment-multiple-sources
that runs daily at 6 AM - It's designed to work alongside other enrichment sources (ProgAI, Clearbit, SERP)
- No special LFX edition handling is required as Crustdata enrichment is edition-agnostic
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check enrichment scheduling implementation
rg -A 10 "scheduleMembersEnrichment" services/apps/premium/members_enrichment_worker/src/
# Check for any LFX-specific handling
rg "Edition.LFX" services/apps/premium/members_enrichment_worker/src/
Length of output: 3688
Script:
#!/bin/bash
# Check the implementation of enrichment sources and workflows
rg -A 10 "members-enrichment-multiple-sources" services/apps/premium/members_enrichment_worker/src/
# Look for Crustdata related configurations and implementations
rg -l "crustdata|CRUSTDATA" services/apps/premium/members_enrichment_worker/src/
# Check workflow implementations
ast-grep --pattern 'class $_ implements Workflow {
$$$
}'
Length of output: 1833
Script:
#!/bin/bash
# Check the workflow implementation
rg -A 20 "class.*getMembersToEnrich" services/apps/premium/members_enrichment_worker/src/workflows/
# Check how Crustdata is integrated into the enrichment process
cat services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
# Check if there are any special configurations for Crustdata
rg "CROWD_ENRICHMENT.*CRUSTDATA" services/apps/premium/members_enrichment_worker/src/
Length of output: 11890
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (4)
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (4)
31-31
: Consider documenting the activity thresholdThe threshold of 1000 activities seems high. Consider adding a comment explaining the rationale behind this number or making it configurable through environment variables.
90-107
: Enhance credits check implementationConsider these improvements:
- The threshold of 5 credits should be documented or made configurable
- Add specific error handling for different types of API errors (e.g., network, authentication)
+ private readonly MIN_REQUIRED_CREDITS = 5; async hasRemainingCredits(): Promise<boolean> { try { const config = { method: 'get', url: `${process.env['CROWD_ENRICHMENT_CRUSTDATA_URL']}/user/credits`, headers: { Authorization: `Token ${process.env['CROWD_ENRICHMENT_CRUSTDATA_API_KEY']}`, }, } const response: IMemberEnrichmentCrustdataRemainingCredits = (await axios(config)).data - return response.credits > 5 + return response.credits > this.MIN_REQUIRED_CREDITS } catch (error) { + if (axios.isAxiosError(error)) { + if (error.response?.status === 401) { + this.log.error('Authentication failed with Crustdata API', error) + } else if (error.response?.status === 429) { + this.log.error('Rate limit exceeded for Crustdata API', error) + } else { + this.log.error('Error while checking Crustdata account usage', error) + } + } return false } }
158-161
: Consider creating a dedicated type for enriched identitiesThe repeated use of this identity structure with additional properties suggests it could benefit from a dedicated interface.
+ interface IEnrichedMemberIdentity extends IMemberIdentity { + repeatedTimesInDifferentSources: number; + isFromVerifiedSource: boolean; + } - const consumableIdentities: (IMemberIdentity & { - repeatedTimesInDifferentSources: number - isFromVerifiedSource: boolean - })[] = [] + const consumableIdentities: IEnrichedMemberIdentity[] = []
288-291
: Add validation for employment dataConsider adding validation for the employment data to ensure all required fields are present before processing.
private normalizeEmployment( data: IMemberEnrichmentDataCrustdata, normalized: IMemberEnrichmentDataNormalized, ): IMemberEnrichmentDataNormalized { const employmentInformation = (data.past_employers || []).concat(data.current_employers || []) if (employmentInformation.length > 0) { for (const workExperience of employmentInformation) { + if (!workExperience?.employer_name) { + this.log.warn('Skipping employment entry with missing employer name') + continue + } const identities = []
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
(1 hunks)
🧰 Additional context used
🪛 Biome
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
[error] 86-86: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
[error] 191-191: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
🪛 GitHub Check: lint-format-services
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
[failure] 1-1:
Delete ;⏎⏎
[failure] 5-5:
Delete ;
[failure] 6-6:
Delete ;
[failure] 7-7:
Replace ·IMemberEnrichmentCache,·IMemberIdentity,·MemberAttributeName,·MemberEnrichmentSource,·MemberIdentityType,·OrganizationIdentityType,·OrganizationSource,·PlatformType·}·from·'@crowd/types';⏎⏎
with ⏎··IMemberEnrichmentCache,⏎··IMemberIdentity,⏎··MemberAttributeName,⏎··MemberEnrichmentSource,⏎··MemberIdentityType,⏎··OrganizationIdentityType,⏎··OrganizationSource,⏎··PlatformType,⏎}·from·'@crowd/types'
[failure] 11-11:
Delete ;
🔇 Additional comments (1)
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (1)
94-96
:
Add environment variable validation
Critical configuration values are accessed without validation. Consider adding a validation method to check required environment variables at startup.
+ private validateEnvVars() {
+ const required = [
+ 'CROWD_ENRICHMENT_CRUSTDATA_URL',
+ 'CROWD_ENRICHMENT_CRUSTDATA_API_KEY'
+ ];
+ const missing = required.filter(key => !process.env[key]);
+ if (missing.length > 0) {
+ throw new Error(`Missing required environment variables: ${missing.join(', ')}`);
+ }
+ }
constructor(public readonly log: Logger) {
super(log)
+ this.validateEnvVars()
}
Likely invalid or redundant comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (5)
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (5)
39-41
: Consider documenting the enrichment threshold rationaleThe threshold of 1000 activities for enrichment eligibility seems arbitrary. Consider adding a comment explaining the rationale behind this number, or make it configurable through environment variables for easier adjustment.
73-96
: Optimize cache iteration performanceThe current implementation iterates through all caches even after finding a valid LinkedIn identity. Consider using Array.some() for better performance:
- let hasEnrichableLinkedinInCache = false - for (const cache of caches) { - if (this.alsoFindInputsInSourceCaches.includes(cache.source)) { - const service = EnrichmentSourceServiceFactory.getEnrichmentSourceService( - cache.source, - this.log, - ) - const normalized = service.normalize(cache.data) as IMemberEnrichmentDataNormalized - if (normalized.identities.some((i) => i.platform === PlatformType.LINKEDIN)) { - hasEnrichableLinkedinInCache = true - break - } - } - } + const hasEnrichableLinkedinInCache = caches.some(cache => { + if (!this.alsoFindInputsInSourceCaches.includes(cache.source)) return false; + const service = EnrichmentSourceServiceFactory.getEnrichmentSourceService( + cache.source, + this.log, + ) + const normalized = service.normalize(cache.data) as IMemberEnrichmentDataNormalized + return normalized.identities.some((i) => i.platform === PlatformType.LINKEDIN) + })🧰 Tools
🪛 Biome
[error] 94-94: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
110-111
: Make credit threshold configurableThe credit threshold of 5 is hard-coded. Consider making this configurable through environment variables and adding documentation about why 5 credits are required.
+ private readonly REQUIRED_CREDITS = Number(process.env['CROWD_ENRICHMENT_CRUSTDATA_REQUIRED_CREDITS'] ?? 5) + async hasRemainingCredits(): Promise<boolean> { try { const response: IMemberEnrichmentCrustdataRemainingCredits = (await axios(config)).data - // realtime linkedin enrichment costs 5 credits - return response.credits > 5 + return response.credits > this.REQUIRED_CREDITS
165-220
: Consider breaking down the complex identity handling logicThe
findDistinctScrapableLinkedinIdentities
method is complex and handles multiple responsibilities. Consider breaking it down into smaller, focused methods:
- Finding identities from caches
- Processing verified identities
- Handling deduplication
This would improve readability and maintainability.
🧰 Tools
🪛 Biome
[error] 204-204: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
91-95
: Use optional chaining for safer property accessConsider using optional chaining to make the code more robust:
- (hasEnrichableLinkedinInCache || - (input.linkedin && input.linkedin.value && input.linkedin.verified)) + (hasEnrichableLinkedinInCache || input.linkedin?.value && input.linkedin?.verified) - if (input.linkedin && input.linkedin.value && input.linkedin.verified) { + if (input.linkedin?.value && input.linkedin?.verified) {Also applies to: 204-218
🧰 Tools
🪛 Biome
[error] 94-94: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
services/apps/premium/members_enrichment_worker/src/main.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- services/apps/premium/members_enrichment_worker/src/main.ts
🧰 Additional context used
🪛 Biome
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
[error] 94-94: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
[error] 204-204: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
🔇 Additional comments (1)
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (1)
222-252
: Well-structured normalization implementation
The normalization logic is well-organized with clear separation of concerns between different types of data (identities, attributes, employment). The use of separate methods for each aspect makes the code maintainable and easy to understand.
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (7)
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (5)
44-44
: Consider documenting the rationale for the activity threshold.The threshold of 1000 activities for member enrichment seems relatively high. Consider adding a comment explaining the reasoning behind this threshold to help future maintainers understand the decision.
96-100
: Improve code robustness with optional chaining.The conditional checks can be made more concise and safer using optional chaining.
- input.activityCount > this.enrichMembersWithActivityMoreThan && - (hasEnrichableLinkedinInCache || - (input.linkedin && input.linkedin.value && input.linkedin.verified)) + input.activityCount > this.enrichMembersWithActivityMoreThan && + (hasEnrichableLinkedinInCache || + input.linkedin?.value && input.linkedin?.verified)🧰 Tools
🪛 Biome
[error] 99-99: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
115-116
: Extract magic number to a named constant.The credit cost (5) should be extracted to a named constant at the class level for better maintainability.
+ private static readonly REALTIME_LINKEDIN_ENRICHMENT_CREDIT_COST = 5; + async hasRemainingCredits(): Promise<boolean> { try { const response: IMemberEnrichmentCrustdataRemainingCredits = (await axios(config)).data - // realtime linkedin enrichment costs 5 credits - return response.credits > 5 + return response.credits > EnrichmentServiceCrustdata.REALTIME_LINKEDIN_ENRICHMENT_CREDIT_COST
176-231
: Consider breaking down the complex identity handling method.The
findDistinctScrapableLinkedinIdentities
method is quite complex and handles multiple responsibilities. Consider extracting the cache processing and verified identity handling into separate methods for better maintainability.+ private processCache( + cache: IMemberEnrichmentCache<IMemberEnrichmentData>, + linkedinUrlHashmap: Map<string, number>, + consumableIdentities: Array<IMemberIdentity & { + repeatedTimesInDifferentSources: number; + isFromVerifiedSource: boolean; + }>, + ): void { + const service = EnrichmentSourceServiceFactory.getEnrichmentSourceService( + cache.source, + this.log, + ) + const normalized = service.normalize(cache.data) as IMemberEnrichmentDataNormalized + if (normalized.identities.some((i) => i.platform === PlatformType.LINKEDIN)) { + this.processLinkedInIdentity( + normalized.identities.find((i) => i.platform === PlatformType.LINKEDIN), + linkedinUrlHashmap, + consumableIdentities, + ) + } + }🧰 Tools
🪛 Biome
[error] 215-215: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
281-290
: Enhance email validation logic.The current implementation splits the email string before validation. Consider validating the entire string first and then split to avoid unnecessary processing of invalid data.
- if (data.email) { - for (const email of data.email.split(',').filter(isEmail)) { + if (data.email?.trim()) { + const emails = data.email + .split(',') + .map(e => e.trim()) + .filter(e => e && isEmail(e)) + for (const email of emails) { normalized.identities.push({ type: MemberIdentityType.EMAIL, platform: this.platform, - value: email.trim(), + value: email, verified: false, }) } }services/apps/premium/members_enrichment_worker/src/sources/crustdata/types.ts (2)
12-13
: Consider usingDate
types for date fields.Using
Date
types instead ofstring
for date fields likestart_date
,end_date
,last_updated
, andlast_tried_linkedin_enrichment_date
can improve type safety and make date manipulations easier.Apply this diff to update the date fields:
export interface IMemberEnrichmentCrustdataEmployer { employer_name: string employer_linkedin_id: string employer_linkedin_description?: string employer_company_id: number[] employee_position_id?: number employee_title: string employee_description?: string employee_location: string - start_date: string - end_date: string + start_date: Date + end_date: Date } export interface IMemberEnrichmentDataCrustdata { linkedin_profile_url: string linkedin_flagship_url: string name: string email: string title: string - last_updated: string + last_updated: Date headline: string summary: string num_of_connections: number skills: string profile_picture_url: string twitter_handle: string languages: string[] all_employers: string[] past_employers: IMemberEnrichmentCrustdataEmployer[] current_employers: IMemberEnrichmentCrustdataEmployer[] all_employers_company_id: number[] all_titles: string[] all_schools: string[] all_degrees: string[] metadata: IMemberEnrichmentLinkedinScraperMetadata } export interface IMemberEnrichmentCrustdataAPIErrorResponse { error: string linkedin_profile_url: string - last_tried_linkedin_enrichment_date: string + last_tried_linkedin_enrichment_date: Date did_last_linkedin_enrichment_succeed: boolean }Also applies to: 22-22, 47-47
17-18
: Add validation or specific types for URLs and email fields.To enhance data integrity, consider using specific types or adding validation for URL and email fields:
linkedin_profile_url
,linkedin_flagship_url
, andprofile_picture_url
could use aURL
type or include validation to ensure they are valid URLs.Apply this diff to update the types:
export interface IMemberEnrichmentDataCrustdata { - linkedin_profile_url: string - linkedin_flagship_url: string name: string - email: string title: string last_updated: string headline: string summary: string num_of_connections: number skills: string - profile_picture_url: string twitter_handle: string languages: string[] all_employers: string[] past_employers: IMemberEnrichmentCrustdataEmployer[] current_employers: IMemberEnrichmentCrustdataEmployer[] all_employers_company_id: number[] all_titles: string[] all_schools: string[] all_degrees: string[] metadata: IMemberEnrichmentLinkedinScraperMetadata + linkedin_profile_url: URL + linkedin_flagship_url: URL + email: Email + profile_picture_url: URL } +// Define types for Email and URL if not already defined +type Email = string; +type URL = string;Also applies to: 20-20, 27-27
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/sources/crustdata/types.ts
(1 hunks)services/apps/premium/members_enrichment_worker/src/workflows/getMembersToEnrich.ts
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- services/apps/premium/members_enrichment_worker/src/workflows/getMembersToEnrich.ts
🧰 Additional context used
🪛 Biome
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts
[error] 99-99: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
[error] 215-215: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
🔇 Additional comments (2)
services/apps/premium/members_enrichment_worker/src/sources/crustdata/service.ts (1)
145-159
: 🛠️ Refactor suggestion
Enhance API call configuration and URL safety.
Several improvements can be made to the API call configuration:
- URL encode the LinkedIn handle to handle special characters
- Add timeout configuration to prevent hanging requests
const config = {
method: 'get',
url,
params: {
- linkedin_profile_url: `https://linkedin.com/in/${handle}`,
+ linkedin_profile_url: `https://linkedin.com/in/${encodeURIComponent(handle)}`,
enrich_realtime: false,
},
headers: {
Authorization: `Token ${process.env['CROWD_ENRICHMENT_CRUSTDATA_API_KEY']}`,
},
+ timeout: 30000, // 30 seconds timeout
}
Likely invalid or redundant comment.
services/apps/premium/members_enrichment_worker/src/sources/crustdata/types.ts (1)
26-26
: Ensure consistency in the skills
field data type.
The skills
field is currently defined as a string
, but considering that a member can have multiple skills, it might be more appropriate to define it as a string[]
for consistency with similar fields like languages
.
Changes proposed ✍️
What
copilot:summary
copilot:poem
Why
How
copilot:walkthrough
Checklist ✅
Feature
,Improvement
, orBug
.Summary by CodeRabbit
Release Notes
New Features
CRUSTDATA
, allowing for enhanced member data enrichment capabilities.SCHOOLS
for more comprehensive data.Updates
Bug Fixes