top of page

Optimizing Mentorship: A Hybrid NLP & ML Model for Automated Mentor-Student Matching

Mentor allocation is a key process for Líderes en Movimiento, a high-stakes leadership program. I developed a hybrid predictive model combining Vector Embeddings and XGBoost. The system automated the complex matching of students to business leaders, reducing the administrative workflow from a three-person committee to a single-person supervision task. The project served as a dual intervention: optimizing operational efficiency while acting as a diagnostic tool that revealed critical flaws in upstream data collection and user interface design.

The challenge:

Líderes en Movimiento connects high-potential university students with top-tier executives from the Mexican Council of Business (CMN). Mentorship is the cornerstone of the program, where students define their professional goals and learn to navigate industry challenges. Previously, matching hundreds of applicants to mentors was a manual process performed by a committee. This approach was time-consuming and prone to fatigue-induced bias. The challenge was to engineer a system that could quantify the qualitative "fit" between a student’s career aspirations and a mentor’s expertise, ensuring high-quality pairings at scale without losing the human nuance required for successful mentorship.


What do I do:

I built a hybrid Success Prediction Model that treated the matching process as an optimization problem rather than a simple sorting task.

  • Semantic Analysis: I utilized Vector Embeddings (using sentence_transformers) to map the semantic topology of student objectives and mentor profiles. This allowed the system to identify matches based on conceptual alignment rather than relying on rigid keyword matching.

  • Predictive Modeling: I trained an XGBoost regressor on historical satisfaction data from previous cohorts. The model used features such as semantic similarity scores, demographic alignment, and personality traits to predict the potential "Success Index" of any given pair.

  • Algorithmic Optimization: To finalize the pairings, I implemented Linear Sum Assignment, which processed the predicted success scores to generate a global optimum—a set of matches that maximized the total satisfaction of the entire cohort, rather than just individual pairs.


1. This model was tailored for the available input data. The final result was a draft match list that consider goals of students, profile and expertise of mentors, demographics similarity and our historical data on good matches.
1. This model was tailored for the available input data. The final result was a draft match list that consider goals of students, profile and expertise of mentors, demographics similarity and our historical data on good matches.

Results:

The model was pilot-tested this year with a cohort of 60 students and 200 mentor candidates.

  • Operational Efficiency: The system generated pre-ranked priority lists, transforming the workflow from a manual deliberation requiring three staff members into a validation task for a single administrator.

  • Scalability: The architecture demonstrated the capacity to handle exponential growth in applications without a corresponding increase in administrative time.


What I learned:

While the model was technically successful, its deployment acted as a powerful diagnostic. I discovered that our technical potential was bottlenecked by design flaws:

  1. Input Quality: The application forms did not scaffold students to articulate clear goals, resulting in vague text that limited the NLP model's precision.

  2. Data Skew: Historical satisfaction data was heavily skewed toward "Excellent" (ceiling effect), making it difficult for the XGBoost model to distinguish between a "good" match and a "transformative" one.

This experience taught me that sophisticated algorithms cannot compensate for flawed interaction design. True optimization requires refining the human input mechanisms first. Consequently, for the next cycle, I am redesigning the intake forms and proposing a goal-setting workshop to ensure the data fed into the model reflects the true depth of the students' needs.

  • Instagram

© Todos los derechos reservados 2025

bottom of page