ΕΕΛΛΑΚ - Λίστες Ταχυδρομείου

Doubts in Who is who Alexandria3k GSOC 2026

Janhavi Kulkarni <janhavikul26 [ at ] gmail [ dot ] com>
11:27 PM (9 minutes ago)
to info
Hi Diomidis,

I'm Janhavi. I am applying for the "Who is who: Alexandria3k entity
disambiguation extensions" project for GSoC 2026. I have done many projects
in NLP, one of them being unsupervised clustering and topic modelling for
bias detection in news data. I’ve spent the past week going through the
codebase and reading papers and exploring models on entity disambiguation
and large-scale implementation to better understand the project.

I had a couple of doubts:

1. The project description mentions probabilistic clustering approaches.
Would it be okay to use external tools/libraries or a different approach if
they’re lightweight and clearly improve performance? Or is the expectation
that the solution should avoid external dependencies?

2. From what I understand, the two main constraints are high precision and
a lightweight / space-efficient implementation. If there’s a tradeoff
between the two, which should be prioritized? My interpretation is that
being lightweight might slightly outweigh precision, but i wanted to know
for precision what would be a good threshold for this project. I have
started trying out some models to see how they perform but a little
guidance on this would be helpful.

3. Since ROR linking is in the same space, I'm thinking of following
similar format of where it fits into codebase, that is in process for
populate (as mentioned in project idea), but for rows that dont have column
values required / are null, can be put in unknown cluster instead of
silently skipping (like for ROR linking).

I’d really appreciate any other guidance or important pointers for this
project
Thanks for your time!
----
Λαμβάνετε αυτό το μήνυμα απο την λίστα: Λίστα αλληλογραφίας και συζητήσεων που απευθύνεται σε φοιτητές developers \& mentors έργων του Google Summer of Code - A discussion list for student developers and mentors of Google Summer of Code projects.,
https://lists.ellak.gr/gsoc-developers/listinfo.html
Μπορείτε να απεγγραφείτε από τη λίστα στέλνοντας κενό μήνυμα ηλ. ταχυδρομείου στη διεύθυνση <gsoc-developers+unsubscribe [ at ] ellak [ dot ] gr>.

πλοήγηση μηνυμάτων