Janhavi Kulkarni <janhavikul26 [ at ] gmail [ dot ] com> 11:27 PM (9 minutes ago) to info Hi Diomidis, I'm Janhavi. I am applying for the "Who is who: Alexandria3k entity disambiguation extensions" project for GSoC 2026. I have done many projects in NLP, one of them being unsupervised clustering and topic modelling for bias detection in news data. I’ve spent the past week going through the codebase and reading papers and exploring models on entity disambiguation and large-scale implementation to better understand the project. I had a couple of doubts: 1. The project description mentions probabilistic clustering approaches. Would it be okay to use external tools/libraries or a different approach if they’re lightweight and clearly improve performance? Or is the expectation that the solution should avoid external dependencies? 2. From what I understand, the two main constraints are high precision and a lightweight / space-efficient implementation. If there’s a tradeoff between the two, which should be prioritized? My interpretation is that being lightweight might slightly outweigh precision, but i wanted to know for precision what would be a good threshold for this project. I have started trying out some models to see how they perform but a little guidance on this would be helpful. 3. Since ROR linking is in the same space, I'm thinking of following similar format of where it fits into codebase, that is in process for populate (as mentioned in project idea), but for rows that dont have column values required / are null, can be put in unknown cluster instead of silently skipping (like for ROR linking). I’d really appreciate any other guidance or important pointers for this project Thanks for your time!
---- Λαμβάνετε αυτό το μήνυμα απο την λίστα: Λίστα αλληλογραφίας και συζητήσεων που απευθύνεται σε φοιτητές developers \& mentors έργων του Google Summer of Code - A discussion list for student developers and mentors of Google Summer of Code projects., https://lists.ellak.gr/gsoc-developers/listinfo.html Μπορείτε να απεγγραφείτε από τη λίστα στέλνοντας κενό μήνυμα ηλ. ταχυδρομείου στη διεύθυνση <gsoc-developers+unsubscribe [ at ] ellak [ dot ] gr>.