ΕΕΛΛΑΚ - Λίστες Ταχυδρομείου

Seeking Guidance on Flex GovDoc Scanner Approach - GSOC 2025

Dear Mentors,

I'm Sushil Pandey, a third-year Computer Science Engineering student with a
focus on AI & ML. I'm writing to express my sincere interest in
contributing to the Flex GovDoc Scanner project for GSOC 2025.

Although I'm relatively new to open source contributions, I believe my
experience with Node.js, REST API development, and document processing
could be valuable for this project. I've worked with OCR systems and NLP
techniques during my coursework and personal projects, which seems relevant
to the metadata extraction requirements.

My understanding of the project is that it aims to transform public
incorporation documents from Greece's ΓΕΜΗ portal into structured,
searchable data through three main components:

   1. *Document Crawling*: Using Node.js with Puppeteer to systematically
   collect PDFs from the portal
   2. *Metadata Extraction*: Applying OCR and NLP techniques to extract key
   information like company names and legal representatives
   3. *Search Service*: Developing a REST API with Express to enable
   efficient searching of the extracted data

I would initially focus on creating a small proof of concept demonstrating
the end-to-end workflow. Then, based on your guidance, I'll develop a
detailed timeline and proposal for the project, making it more realistic
and insightful.

If possible, I would appreciate your insights on:

   - Whether MongoDB, Elasticsearch, or a combination would be most
   suitable for this project
   - The expected scale of documents the system should handle
   - Any specific challenges with the ΓΕΜΗ portal I should be aware of
   - Recommended tools for processing Greek language documents

I've begun exploring the portal structure and setting up a basic
development environment, but I'm eager to align my approach with your
expectations for the project.

Thank you for considering my interest. I'm excited about the opportunity to
learn from your expertise and contribute meaningfully to this initiative.
-- 

*Warm regards,Sushil Pandey *

*Github: https://github.com/sushilpandeyy
<https://github.com/sushilpandeyy>LinkedIn:
https://www.linkedin.com/in/contactsushil/
<https://www.linkedin.com/in/contactsushil/>*
----
Λαμβάνετε αυτό το μήνυμα απο την λίστα: Λίστα αλληλογραφίας και συζητήσεων που απευθύνεται σε φοιτητές developers \& mentors έργων του Google Summer of Code - A discussion list for student developers and mentors of Google Summer of Code projects.,
https://lists.ellak.gr/gsoc-developers/listinfo.html
Μπορείτε να απεγγραφείτε από τη λίστα στέλνοντας κενό μήνυμα ηλ. ταχυδρομείου στη διεύθυνση <gsoc-developers+unsubscribe [ at ] ellak [ dot ] gr>.