ΕΕΛΛΑΚ - Λίστες Ταχυδρομείου

Re: Interest and some Questions about "Exploring and Abstracting Triplestore Alternatives" Project

Subject: Re: Interest and some Questions about "Exploring and Abstracting Triplestore Alternatives" Project
From: Maira Papadopoulou <mmpapadopoulouu [ at ] gmail [ dot ] com>
Date: Tue, 25 Mar 2025 18:46:24 +0200

Dear Alexios Zavras,

Thank you very much for your detailed and insightful feedback. I truly
appreciate the time you took to guide me with essential advice for the
project.

I understand that the proposal should include a detailed time plan, and I
fully intend to provide one. My intention was first to clarify the overall
structure and approach of the project to ensure that the timeline I draft
is aligned with the project’s objectives and practical expectations. Now
that I have a clearer understanding, I will proceed with drafting a
structured and realistic workplan, most likely on a weekly basis, as you
asked to keep in a timeline.

To initiate the project with a solid and practical foundation, I have
already downloaded and begun working with BlazeGraph, Apache Jena, GraphDB,
and KuzuDB. These four triplestore systems offer a diverse yet manageable
set of backends, ease on use and allowing me to focus on building a
complete and functional abstraction layer while covering a broad range of
implementation capabilities.

Regarding the design approach, I appreciate your response about the REST
API idea, I realize I was overcomplicating the solution. Given the
classification of the project as a "Long"(350 hours), I initially assumed a
more complex system would be required to meet expectations. However, your
proposed model, which is centered around a clear object-oriented design
with an abstract TripleStore interface and concrete implementations for
specific backends, seems more easy and practical. Specifically, your
reference to SQLAlchemy was helpful in illustrating the simplicity and
effectiveness of such an approach. In line with this direction, I have
already explored and found existing Python libraries that support
integration with the triplestores I mentioned. In particular, I came across
SPARQLWrapper, which supports connections with BlazeGraph, Apache Jena, and
GraphDB, as well as kuzu, a dedicated Python library for KuzuDB. These seem
like solid starting points to build the abstraction layer and implement a
complete, working library with all four triplestore alternatives as you
suggested.

Do you think this is a sound approach for the Development part of the
project, or would you recommend any adjustments or any additional
considerations before I proceed further? I would be grateful for any
further advice you might have.

Thank you once again for your thoughtful guidance!

Best regards,
Maira Papadopoulou


Στις Παρ 21 Μαρ 2025 στις 3:44 μ.μ., ο/η Alexios Zavras <zvr+eellak [ at ] zvr [ dot ] gr>
έγραψε:

> Maira, please use the mailing list [added in cc],
> so that others may see the exchange and get some info.
> I've redacted your proposal text.
>
> The proposal should definitely include a workplan / timeline.
> Ideally it should be in weekly granularity, but two-weeks
> is also acceptable. This is the only way of keeping track
> of the progress of the work.
>
> To your specific question about triplestrores alternatives to explore:
> the set you propose is rather exhaustive. I can only think of MillenniumDB
> and probably KuzuDB to add as alternatives.
> It may not be feasible to explore all of these in depth.
> I would suggest that you take a look at each one of them,
> see which you are comfortable with (I mean, install, try them out)
> and focus first on these.
> It's better to end up with a working library for a few
> than an incomplete library trying to handle everything.
> This is another point where your workplan will guide you.
>
> On the API design, I admit I was surprised by your idea
> to use a REST API (and Flask to implement it).
> I don't outright discard this idea, but it has to be justified somehow.
>
> When designing the API, start by thinking what functionality
> the program using the library will require -- that's what you have
> to provide.
> Let's see: it will obviously need to (a) connect to a datastore
> and (b) execute commands (queries or other).
>
> Think of SQLAlchemy (which abstracts SQL database access)
> and the first primitives it provides:
>     engine = create_engine(db_info)
>     with engine.connect() as c:
>         result = c.execute(commands)
> These correspond to (a) and (b) above, since the required functionality
> is the same.
>
> Going further, you might think of providing more functionality
> like bulk load of data (instead of the user doing a number of INSERT
> statements themselves).
>
> By thinking this way, you should get to a list of primitives/functions
> that your library should provide.
>
> Implementing those is the next step. They will be simple function calls
> to the user. Whether you choose to implement them via a REST interface
> via HTTP to a server process is an implementation decision for you.
> I personally find this too complex.
>
> My modeling would be: I expose an API with a connect() function.
> I then implement this for different back-end triplestores:
> connect_blazegraph(), connect_jena(), connect_millenium(), etc.
> and the function decides which one to call.
> The same with execute() -- and that's all.
>
> Doing the OO way, there will be a general TripleStore() class
> (with methods connect() and execute()) and various implementation
> classes Jena(TripleSrore) with their own methods that actually talk
> to specific back end.
>
> Hope this helps,
>
> On Thu, Mar 20, 2025, at 12:48, Maira Papadopoulou wrote:
> > Dear Alexios Zavras,
> >
> > I hope this email finds you well. I have begun drafting my proposal for
> > the project and would appreciate any feedback or guidance you can
> > provide to help refine my approach.
> >
> > [...]
> >
> > Questions
> > This is my proposal so far, but I have a few additional questions.
> > First, is selecting Blazegraph, Virtuoso, GraphDB, Apache Jena TDB, and
> > Stardog as triplestore alternatives sufficient, or would it be better
> > to include more or fewer options? Since I’m still learning about best
> > practices in API development, I would greatly appreciate any guidance
> > or recommendations you can offer regarding this. Specifically, are
> > there any design patterns that would be particularly useful for
> > creating an abstraction layer that works seamlessly across different
> > triplestore alternatives? Additionally, should I follow a typical REST
> > approach (using POST, GET, PUT, DELETE), or is there a more suitable
> > method for interacting with triplestore databases over HTTP? Finally,
> > do you think this approach to the project is sound and are there any
> > concerns or improvements you would suggest? Any guidance you can offer
> > would be greatly appreciated.
> >
> > Looking forward to your response.
> >
> >  Best regards,
> >  Maira Papadopoulou
> >
> > Στις Τετ 12 Μαρ 2025 στις 3:27 μ.μ., ο/η Alexios Zavras
> > <zvr+eellak [ at ] zvr [ dot ] gr <mailto:zvr%2Beellak [ at ] zvr [ dot ] gr>> έγραψε:
> >> Thanks for your interest in the project, Maira.
> >>
> >> I don't have much to add, and I'm looking forward
> >> receiving your application.
> >> Keep in mind that for testing and analysis parts,
> >> I can also provide data (more than enough!),
> >> so that you can work with real-world data
> >> and not only synthetic ones.
> >>
> >> On Wed, Mar 12, 2025, at 11:40, Maira Papadopoulou wrote:
> >> > Dear Alexios Zavras,
> >> >
> >> > I hope this email finds you well. My name is Maira and I am a
> >> > third-year undergraduate student at the Department of Informatics and
> >> > Telecommunications in National and Kapodistrian University of Athens.
> I
> >> > recently came across the "Exploring and Abstracting Triplestore
> >> > Alternatives" project for Google Summer of Code 2025, and I am very
> >> > interested in contributing to it. I find the idea of analyzing and
> >> > developing an abstraction layer for various triplestore alternatives
> >> > both fascinating and impactful for the semantic web, especially in
> the
> >> > context of making RDF-based data management more accessible to
> >> > developers.
> >> >
> >> > Regarding the knowledge prerequisites, I have experience in Python,
> C,
> >> > which I believe will be valuable for both the implementation and
> >> > performance evaluation aspects of the project. Although I have no
> >> > direct experience with SPARQL, I have experience with SQL also, and
> >> > after researching SPARQL’s syntax, I noticed that its basic structure
> >> > is quite similar to SQL. Given this similarity, I believe I can adapt
> >> > to SPARQL quickly and effectively.
> >> >
> >> > To effectively contribute to the project, I plan to follow a
> structured
> >> > approach aligned with the methodology in the Contributor's Guidance:
> >> > a) For the 'Research' part, I will begin by studying different
> >> > triplestore alternatives, such as Blazegraph, Virtuoso, GraphDB and
> >> > Stardog analyzing their architectures, query execution models, and
> >> > storage strategies.
> >> > b) For the 'Testing' part, I will set up and run some basic SPARQL
> >> > queries on these different triplestores, testing their performance
> >> > under different conditions, such as varying dataset sizes.
> >> > c) For the 'Analysis' part, I will identify advantages and weaknesses
> >> > of each alternative by benchmarking execution times and memory usage.
> >> > d) For the 'Develop' part,  I would greatly appreciate further
> >> > clarification on the expected functionality of the Python library.
> >> > Understanding its intended role, whether it should primarily serve as
> a
> >> > middleware for processing and routing SPARQL queries or include
> >> > additional optimization features like translating RDF data to Python,
> >> > would help me strengthen my application.
> >> > e) For the 'Documentation' part, once I gain a comprehensive
> >> > understanding of the implementation of the library, I will ensure
> that
> >> > the library is thoroughly documented so that other developers can
> >> > easily integrate and utilize the abstraction layer.
> >> >
> >> > I would love the opportunity to discuss this project further and
> >> > understand how I can best contribute. Please let me know if there are
> >> > any additional resources that would be helpful for me to review.
> >> >
> >> > Looking forward to your response.
> >> >
> >> > Best regards,
> >> > Maira Papadopoulou
> >> > ----
> >> > Λαμβάνετε αυτό το μήνυμα απο την λίστα: Λίστα αλληλογραφίας και
> >> > συζητήσεων που απευθύνεται σε φοιτητές developers \& mentors έργων
> του
> >> > Google Summer of Code - A discussion list for student developers and
> >> > mentors of Google Summer of Code projects.,
> >> > https://lists.ellak.gr/gsoc-developers/listinfo.html
> >> > Μπορείτε να απεγγραφείτε από τη λίστα στέλνοντας κενό μήνυμα ηλ.
> >> > ταχυδρομείου στη διεύθυνση <gsoc-developers+unsubscribe [ at ] ellak [ dot ] gr
> <mailto:gsoc-developers%2Bunsubscribe [ at ] ellak [ dot ] gr>
> >> > <mailto:gsoc-developers%2Bunsubscribe [ at ] ellak [ dot ] gr <mailto:
> gsoc-developers%252Bunsubscribe [ at ] ellak [ dot ] gr>>>.
> >>
> >> --
> >> -- zvr -
>
> --
> -- zvr -
>

----
Λαμβάνετε αυτό το μήνυμα απο την λίστα: Λίστα αλληλογραφίας και συζητήσεων που απευθύνεται σε φοιτητές developers \& mentors έργων του Google Summer of Code - A discussion list for student developers and mentors of Google Summer of Code projects.,
https://lists.ellak.gr/gsoc-developers/listinfo.html
Μπορείτε να απεγγραφείτε από τη λίστα στέλνοντας κενό μήνυμα ηλ. ταχυδρομείου στη διεύθυνση <gsoc-developers+unsubscribe [ at ] ellak [ dot ] gr>.

Re: Interest and some Questions about "Exploring and Abstracting Triplestore Alternatives" Project

απαντήσεις

αναφορές

πλοήγηση μηνυμάτων