Duplicate Detection Batch Scheduler
Component Detail
Infrastructure
medium complexity
backend
1
Dependencies
0
Dependents
1
Entities
0
Integrations
Description
Cron-scheduled infrastructure job that triggers the nightly duplicate detection scan across all active organizations. Runs after business hours to minimize database contention, uses the composite index on activities for efficient cross-record comparison, and emits a structured report of new flags raised and previously resolved duplicates re-detected.
duplicate-detection-batch-scheduler
Responsibilities
- Schedule and invoke DuplicateDetectionService.runBatchDetection for all active organizations
- Log batch run metadata including duration, records scanned, and flags raised
- Handle failures per organization in isolation so one failing org does not abort the full batch
- Expose a manual trigger endpoint for admin-initiated on-demand scans
Interfaces
scheduleBatchRun(cronExpression: string): void
triggerManualRun(orgId?: string): Promise<BatchDetectionReport>
getLastRunReport(orgId: string): Promise<BatchRunMetadata>