ΕΕΛΛΑΚ - Λίστες Ταχυδρομείου

Re: Regarding EDGAR - CRAWLER: Democratizing accessibility to financial NLP documents

Hi Fabian!

Thank you for your interest! It's great to see that this open-source
project contributes to other people's FinTech research, like yours.

1. Yes, automatic regex catching is one idea that I'd like to explore next.
It would be worth exploring something like feeding a small batch of HTML
documents (one by one) to an LLM with a big context (like Claude Opus),
creating regex expressions, and iterating the regexes dynamically so, in
the end, we have a "robust" regex that captures almost everything!
2. Yep, an edgar-crawler demo using Gradio and hosting it on HuggingFace
would be beneficial in order to make the project more accessible.

These are some wonderful ideas, and I'd love to put them in work after we
finish with the 8-K/10-Q documents :)

Another idea is to create a "fallback extractor", aka, if we do a retrieval
test and see that we don't have the section extracted for filing X, maybe
route this job to be done by an LLM. This would enhance missing documents
in EDGAR-CORPUS (corpus created by edgar-crawler), but I'll need to check
this from a financial pov first.

Thank you again, and I am looking forward to your proposal!

On Mon, Mar 25, 2024 at 4:13 PM Fabian Billert <fabian [ dot ] billert [ at ] gmail [ dot ] com>
wrote:

> Hello,
>
> My name is Fabian Billert and I'm interested in the project
> "EDGAR-CRAWLER: Democratizing accessibility to Financial NLP documents".
> I'm a PhD student located in Düsseldorf, Germany, and I'm mostly focused on
> research in the financial NLP domain.
>
> While working on financial information extraction from different types of
> data, I have participated in some related workshops in the last 2 years: In
> two FinNLP workshops, I used different techniques to work with corporate
> sustainability information, and in another workshop on multimodal AI for
> financial forecasting I used sentiments extracted from news articles to
> create a sentiment-steered financial index. You can see some of the papers
> I presented there on my google scholar page at the bottom, although some
> of the work is still in the process of being published.
>
> During my work with different sources of data, I also wondered about
> extracting information from the SEC databases and found the EDGAR-CRAWLER
> project while looking for solutions. I am very excited about expanding the
> capabilities of the project and creating an easy access for even more
> information.
> While thinking about the project, I also had some other ideas that might
> be interesting - for one, would it not be possible to automate the process
> of creating regular expressions using generative LLMs? This might
> accelerate the project a lot and also help with similar kinds of projects.
> I am also wondering if you are planning to create a user interface for the
> extracted data.
> Other than that, do you have any other plans for new features after we
> have finished adding new types of company filings?
>
> I'm excited to hear back from you soon.
>
> Best regards,
> Fabian Billert
>
>
> My CV: https://www.overleaf.com/read/yqfbrzpqbnzf#8e1844
> My LinkedIn: https://www.linkedin.com/in/fabian-billert-38b54b156/
> My GoogleScholar:
> https://scholar.google.de/citations?user=jTcST-gAAAAJ&hl=de
> ----
> Λαμβάνετε αυτό το μήνυμα απο την λίστα: Λίστα αλληλογραφίας και συζητήσεων
> που απευθύνεται σε φοιτητές developers \& mentors έργων του Google Summer
> of Code - A discussion list for student developers and mentors of Google
> Summer of Code projects.,
> https://lists.ellak.gr/gsoc-developers/listinfo.html
> Μπορείτε να απεγγραφείτε από τη λίστα στέλνοντας κενό μήνυμα ηλ.
> ταχυδρομείου στη διεύθυνση <gsoc-developers+unsubscribe [ at ] ellak [ dot ] gr>.
>
----
Λαμβάνετε αυτό το μήνυμα απο την λίστα: Λίστα αλληλογραφίας και συζητήσεων που απευθύνεται σε φοιτητές developers \& mentors έργων του Google Summer of Code - A discussion list for student developers and mentors of Google Summer of Code projects.,
https://lists.ellak.gr/gsoc-developers/listinfo.html
Μπορείτε να απεγγραφείτε από τη λίστα στέλνοντας κενό μήνυμα ηλ. ταχυδρομείου στη διεύθυνση <gsoc-developers+unsubscribe [ at ] ellak [ dot ] gr>.

πλοήγηση μηνυμάτων