Back to projects

Project

Candidate Ranking system

AI-powered candidate ranking system that combines semantic embeddings and structured signals to surface the best-fit candidates from a 100K pool - without keyword matching.

cover-image

Problem

  • Traditional recruiting tools rank by keyword overlap, rewarding profile stuffing over genuine fit.
  • This system reads career narratives for meaning instead of matching tags.
  • It weighs behavioral signals like availability and engagement, and penalizes implausible profiles.
  • Built for the Redrob Data & AI Challenge, it ranks a 100K pool against a job description in under 5 minutes on CPU — no GPU, no external APIs.

Approach

Two-phase architecture: precompute once, rank many

  • Phase 1 does the heavy, query-independent work: structured scores plus the BGE embedding of every candidate.
  • Phase 1 writes six parallel artifacts to artifacts/: candidate_ids.npy, semantic_scores.npy, three Track-1 score arrays, and track1_details.pkl.
  • Phase 2 (phase2/rank.py) loads those artifacts and ranks with no model load, no network, and no GPU.
  • The expensive embedding pass is paid once per population; re-ranking is pure NumPy.
  • The demo (demo_pipeline.py) collapses both phases into one in-memory pass since uploaded pools have no precomputed artifacts.

Candidate text representation (Track 2)

  • build_candidate_text serializes each candidate as: current role → career history → summary → skills → education.
  • Career history is recency-weighted via CAREER_TOKEN_BUDGET: 125 tokens for the most recent job, then 94, 62, and title-and-company-only for the 4th+ role.
  • Skills trail the narrative as a subordinate, noisy signal, and beginner proficiency is dropped (_PROFICIENCY_RANK keeps only intermediate/advanced/expert).
  • Certifications are excluded as low-signal.
  • Both candidate and JD text get BGE instruction prefixes for asymmetric retrieval.

Embedding model

  • The model is BAAI/bge-base-en-v1.5, a bi-encoder.
  • Each candidate and the JD (build_jd_text) are encoded independently.
  • Semantic score is cosine similarity of L2-normalized vectors, a dot product once normalized (cosine_similarity_matrix).
  • The bi-encoder lets candidate vectors be precomputed and reused across queries, matching the two-phase split.

Structured signals as multipliers (Track 1)

  • All three scores are in [0, 1] and read fields only through the field_map.py accessor layer.
  • Hard filter (track1_hard_filter.py) is the product location × yoe × work_mode × consulting_penalty × tenure.
  • Hard filter scores Pune/Noida 1.0, other tier-1 cities 0.9, the JD's 5–9 YOE band 1.0 (0.85 outer band), all-consulting careers (TCS/Infosys/Wipro/etc.) 0.3, and sub-12-month average tenure 0.50.
  • Availability (track1_availability.py) is a weighted average: open_to_work (0.30), last_active (0.25), response_time (0.20), notice_period (0.15), recruiter-response (0.05), applications-30d (0.05).
  • Availability excludes interview_completion_rate as a weak discriminator.
  • Credibility (track1_credibility.py) weights profile completeness (0.25), log-scaled endorsements (0.20), education tier (0.20), GitHub (0.20), LinkedIn (0.10), verified contact (0.05).
  • A -1 GitHub sentinel maps to 0.0, distinguishing "no account" from "low activity."

Final score and ranking

  • Phase 2 min-max normalizes semantic scores across the population (normalize_semantic).
  • final_score = semantic_norm × hard_filter × availability × credibility.
  • The multiplicative form lets any single disqualifying signal (e.g. consulting-only career → 0.3) collapse the whole score.
  • rank_top_n stable-sorts via np.lexsort on (final, semantic_norm, hard_filter) descending and slices the top 100.

Reasoning generation

  • build_reasoning emits a per-candidate header with role, YOE, and location.
  • It adds a semantic-match phrase tuned to the normalized score and real top skills only.
  • _career_clause fingerprints the previous row so consecutive current-role snippets never duplicate.
  • A Concerns section is driven by Track-1 sub-signals: long notice period, no GitHub, consulting background, short tenure.
  • _jd_hits surfaces JD-relevant areas only from skills/text the candidate actually has.

Honeypot avoidance

  • There is no separate honeypot detector; resistance is emergent.
  • Keyword-stuffed profiles gain little because skills are subordinate and the bi-encoder scores narrative meaning.
  • Implausible combinations (consulting-only history, sub-12-month tenure, zero applications) hit the multiplicative penalties and collapse the final score.

Architecture

arch

Screenshots

RESULTS

Limitations

  • The demo caps at MAX_CANDIDATES = 100 and runs in memory, skipping the 100K path, artifact caching, and streaming top-N loader.
  • Demo semantic normalization is relative to the uploaded set, so scores are not comparable across runs.
  • Uploads with missing fields fall back to neutral defaults (avg_response_time → 280.0, notice_period → 90, github → -1), which can flatter or penalize partial profiles.
  • field_map.py hardcodes EDA-derived ranges (GITHUB_SCORE_MAX = 96.9, LAST_ACTIVE 23–263 days, RESPONSE_TIME 2.1–280h, ENDORSEMENTS_MAX = 242) that won't transfer to other distributions.
  • _JD_OFFICE_CITIES = {'pune', 'noida'} is hardcoded because the parser can't separate "office location" from "welcome to apply."
  • interview_completion_rate and standalone skill_assessment_scores are excluded for sparsity; current_company_size, India-based willing_to_relocate, and certifications carry little weight.
  • A missing/unparseable last_active_date scores 0.0 (fully stale, not neutral).
  • consulting_penalty falls back to current-company matching when career history is empty.
  • Consulting detection is substring-based, so a non-consulting firm whose name contains a flagged token can be mismatched.

Future Improvements

  • Add a cross-encoder second pass re-scoring only the top-K (~200), slotting between rank_top_n and reasoning.
  • Fine-tune bge-base-en-v1.5 (LoRA/QLoRA) on recruiter relevance labels for sharper JD-to-candidate discrimination.
  • Promote skill_assessment_scores to a weighted credibility component once coverage is higher.
  • Use company_size, current_industry, and the parsed seniority_level (currently unused in scoring) as additional fit signals.
  • Learn Track-1 weights and the combination from ground-truth labels (e.g. learning-to-rank) instead of hand-tuning.
  • Add explicit honeypot checks: YOE-vs-career-duration consistency, boilerplate-description detection beyond _career_clause, and outlier endorsement/GitHub combinations.