Dear GlossAPI Mentors, I am Thanos, and am writing to express my strong interest in the "GlossAPI ML-assisted Anonymization Layer" project for GSoC 2026. I hold a PhD in Physics, and am currently pursuing my Master's Degree in Machine Learning and AI, where my current coursework is heavily focused on Deep Learning, NLP, and PyTorch. Additionally, I have 4 years of professional experience as software engineering with a solid understanding of backend technologies and interactions with APIs. This project aligns perfectly with my strong interest in Natural Language Processing and AI. I have a solid, foundation in Python, having used it extensively during my PhD to solve complex differential equations numerically. Furthermore, my professional engineering background can ensure that I can not only build and evaluate robust NER models for Greek text, but I can also seamlessly integrate the final anonymization module into the existing GlossAPI pipeline. As I begin researching the models and structuring my proposal, I have some technical questions to ensure my approach aligns with your goals: 1. Baseline Models: For the ML-based approach, do you have any preference for starting with specific pre-trained models (such as GreekBERT), or would you like my proposal to include a comparative evaluation pipeline of various lightweight vs. transformer-based models? 2. Handling OCR Noise: Since OCR noise is highlighted as a specific challenge, do you have a small sample dataset containing these typical errors that I could potentially analyse? This could help me ensure that my proposed preprocessing and masking rules are robust against real-world data. 3. Pipeline Integration: How do think this module would be integrated into the GlossAPI? For example, should this module be designed as a standalone microservice, or as a direct Python package dependency within the existing pipeline? Thank you for your time and for maintaining this project. If you have any other observations or architectural preferences that could help me shape my proposal, they would be more than welcome. I look forward to hearing from you. Best regards, Thanos Smponias
---- Λαμβάνετε αυτό το μήνυμα απο την λίστα: Λίστα αλληλογραφίας και συζητήσεων που απευθύνεται σε φοιτητές developers \& mentors έργων του Google Summer of Code - A discussion list for student developers and mentors of Google Summer of Code projects., https://lists.ellak.gr/gsoc-developers/listinfo.html Μπορείτε να απεγγραφείτε από τη λίστα στέλνοντας κενό μήνυμα ηλ. ταχυδρομείου στη διεύθυνση <gsoc-developers+unsubscribe [ at ] ellak [ dot ] gr>.