Hi, Devesh! thank you for your interest. In EDGAR-CRAWLER, we automate the download and the extraction of US financial documents from the Security and Exchange's Comission "EDGAR". We do that using web scraping techniques (beautifulsoup) and classic NLP methodologies like regular expressions to extract the .html/.txt documents to nicely organized JSON files, in order to speedup NLP pipelines which require such data. We have written a short 4-page technical report that describes the whole process: https://drive.google.com/file/d/1VQQejx4bWEsgKG7sRoAYH1vj4hzQoqfU/view?usp=sharing I believe this document will give you some more comprehensive insights, as you requested. Feel free to reach out if you have any more questions. Best, On Thu, Mar 14, 2024 at 12:17 PM Devesh Negi <devesh [ dot ] negi22 [ at ] gmail [ dot ] com> wrote: > I hope this email finds you well. My name is Devesh Negi, and I am writing > to express my interest in participating in the Google Summer of Code (GSOC) > 2024 program under the "EDGAR-CRAWLER: Democratizing accessibility to > Financial NLP documents" project. I am intrigued by the objectives of this > project and eager to contribute to its success. > > I am pursuing BTech from IIT Madras. I'm proficient in Python and AI/ML > and have a strong interest in NLP. I am eager to gain a deeper > understanding of the "EDGAR-CRAWLER" project. Could you kindly provide more > comprehensive insights into the project's objectives, the specific > challenges it aims to address, and the methodologies or technologies it > employs? Additionally, I would greatly appreciate any guidance on how best > to approach and tackle the complexities inherent in the project. Also any > recommended resources or reading materials to familiarize myself with the > project beforehand would be immensely valuable. > > Looking forward to hearing from you soon. > > Warm regards, > Devesh Negi > ---- > Λαμβάνετε αυτό το μήνυμα απο την λίστα: Λίστα αλληλογραφίας και συζητήσεων > που απευθύνεται σε φοιτητές developers \& mentors έργων του Google Summer > of Code - A discussion list for student developers and mentors of Google > Summer of Code projects., > https://lists.ellak.gr/gsoc-developers/listinfo.html > Μπορείτε να απεγγραφείτε από τη λίστα στέλνοντας κενό μήνυμα ηλ. > ταχυδρομείου στη διεύθυνση <gsoc-developers+unsubscribe [ at ] ellak [ dot ] gr>. >
---- Λαμβάνετε αυτό το μήνυμα απο την λίστα: Λίστα αλληλογραφίας και συζητήσεων που απευθύνεται σε φοιτητές developers \& mentors έργων του Google Summer of Code - A discussion list for student developers and mentors of Google Summer of Code projects., https://lists.ellak.gr/gsoc-developers/listinfo.html Μπορείτε να απεγγραφείτε από τη λίστα στέλνοντας κενό μήνυμα ηλ. ταχυδρομείου στη διεύθυνση <gsoc-developers+unsubscribe [ at ] ellak [ dot ] gr>.