Dear Dimitrios Athanasopoulos, Nikos Tsekos, and Panagiotis Skarvelis, My name is Saket Jha, and I am currently in my 3rd year pursuing Information Technology at the National Institute of Technology Karnataka, Mangalore, India, with a GPA of 9.67. I am originally from Mumbai, Maharashtra. I am applying for GSoC with Open Technologies Alliance – GFOSS Project title: GlossAPI: Needs-Driven Evolution of the Dataset Production Pipeline for Greek Language Data and have prepared my proposal for improving the GlossAPI dataset production pipeline. GitHub <https://github.com/saketjha34> Proposal <https://docs.google.com/document/d/1FiH3ssMMxI29BFkdw0KnDkU44MbRy0S3FcFYUOxUNoc/edit?usp=sharing> Portfolio <https://saketjha34.github.io/> I explored the GlossAPI repository and identified a few areas where improvements can be made. I am considering two main approaches: 1. Building a structured ETL pipeline using Airflow along with Docker and Docker Compose for better workflow management and deployment. 2. Improving the existing pipeline by enhancing OCR using MistralOCR, strengthening data preprocessing, and adding a proper data validation layer using Pydantic before storing the data in the database or exporting it in JSON format. I would be grateful if you could review my proposal and suggest any improvements or changes that would better align it with the project goals. Thank you for your time and guidance. I look forward to your feedback. Best regards, Saket Jha
---- Λαμβάνετε αυτό το μήνυμα απο την λίστα: Λίστα αλληλογραφίας και συζητήσεων που απευθύνεται σε φοιτητές developers \& mentors έργων του Google Summer of Code - A discussion list for student developers and mentors of Google Summer of Code projects., https://lists.ellak.gr/gsoc-developers/listinfo.html Μπορείτε να απεγγραφείτε από τη λίστα στέλνοντας κενό μήνυμα ηλ. ταχυδρομείου στη διεύθυνση <gsoc-developers+unsubscribe [ at ] ellak [ dot ] gr>.