Dear Mentors, I'm Sushil Pandey, a third-year Computer Science Engineering student with a focus on AI & ML. I'm writing to express my sincere interest in contributing to the Flex GovDoc Scanner project for GSOC 2025. Although I'm relatively new to open source contributions, I believe my experience with Node.js, REST API development, and document processing could be valuable for this project. I've worked with OCR systems and NLP techniques during my coursework and personal projects, which seems relevant to the metadata extraction requirements. My understanding of the project is that it aims to transform public incorporation documents from Greece's ΓΕΜΗ portal into structured, searchable data through three main components: 1. *Document Crawling*: Using Node.js with Puppeteer to systematically collect PDFs from the portal 2. *Metadata Extraction*: Applying OCR and NLP techniques to extract key information like company names and legal representatives 3. *Search Service*: Developing a REST API with Express to enable efficient searching of the extracted data I would initially focus on creating a small proof of concept demonstrating the end-to-end workflow. Then, based on your guidance, I'll develop a detailed timeline and proposal for the project, making it more realistic and insightful. If possible, I would appreciate your insights on: - Whether MongoDB, Elasticsearch, or a combination would be most suitable for this project - The expected scale of documents the system should handle - Any specific challenges with the ΓΕΜΗ portal I should be aware of - Recommended tools for processing Greek language documents I've begun exploring the portal structure and setting up a basic development environment, but I'm eager to align my approach with your expectations for the project. Thank you for considering my interest. I'm excited about the opportunity to learn from your expertise and contribute meaningfully to this initiative. -- *Warm regards,Sushil Pandey * *Github: https://github.com/sushilpandeyy <https://github.com/sushilpandeyy>LinkedIn: https://www.linkedin.com/in/contactsushil/ <https://www.linkedin.com/in/contactsushil/>*
---- Λαμβάνετε αυτό το μήνυμα απο την λίστα: Λίστα αλληλογραφίας και συζητήσεων που απευθύνεται σε φοιτητές developers \& mentors έργων του Google Summer of Code - A discussion list for student developers and mentors of Google Summer of Code projects., https://lists.ellak.gr/gsoc-developers/listinfo.html Μπορείτε να απεγγραφείτε από τη λίστα στέλνοντας κενό μήνυμα ηλ. ταχυδρομείου στη διεύθυνση <gsoc-developers+unsubscribe [ at ] ellak [ dot ] gr>.