ΕΕΛΛΑΚ - Λίστες Ταχυδρομείου

Re: Expression of interest and questions w.r.t. :Exploring and Abstracting Triplestore Alternatives

  • Subject: Re: Expression of interest and questions w.r.t. :Exploring and Abstracting Triplestore Alternatives
  • From: ΑΒΟΥΡΗΣ ΛΑΜΠΡΟΣ <up1092732 [ at ] ac [ dot ] upatras [ dot ] gr>
  • Date: Sun, 23 Mar 2025 18:13:33 +0200
Στις 2025-03-23 17:16, Alexios Zavras έγραψε:
Thanks for your interest in the project, Lampro.

Quick replies:

1. the goal of the project is to provide an abstraction layer
to existing triplestores. You could implement a new one
of your own for testing purposes or for benchmark baseline,
but don't plan to spend too much time on this task,
as it's not part of the goal.

2. SPARQL (not SparkQL) is the standard query language
for RDF data, so it makes sense to be the common denominator
that all triplestores implement and handle.
If there are other way of interacting with specific backends,
it will be valuable to also allow them for the user.

3. I've written in a previous reply the very basic structure
of the implementation, so I won't repeat it here.
Regarding benchmarking, we will have a common scenario
to run with different backends: create a database, load this data,
perform these queries, etc.

4. Regarding scheduling, I expect to see a weekly plan
of work, results and milestones.
Keep in mind that:
- work starts on June 2nd; and
- this project has been scoped as requiring 350 hours of work.
It's up to you to consider all your other commitments
(time off, exams, other work, ...) and come up with a plan
of how much work will be done at every stage.
Most GSoC projects are on a 12-week scale (3 months),
but this might be too ambitious for this one,
so feel free to extend it according to your planning.
You will be expected to follow the plan you submit
(after some adjustments that might be made before work starts).

5. I can't think of any "special requirements".
I might post a message about what I expect to see in a proposal.
Oh, and a clarification about using "AI"...


On Sun, Mar 23, 2025, at 14:13, ΑΒΟΥΡΗΣ ΛΑΜΠΡΟΣ wrote:
To whom it may concern, and especially to Dr. Alexios Zavras,

Hello, my name is Lampros Avouris. I am a fourth-year electrical
engineering and computer technologies student at the University of
Patras.
I am looking to participate in the 2025 GSOC, and the Exploring and
Abstracting Triplestore Alternatives project has especially piqued my
interest.

I have looked into the project for quite a bit and have some questions
so that I am able to create the best possible project proposal.

Firstly: When referring to Triplestore alternatives, are we talking
exclusively about systems with native RDF triple store support like
Apache Jena, Stardog, etc., or will it be useful to explore the
implementation of triplestores in alternative technologies not
purpose-built for triplestores, like adapting graph databases/multimodal
databases like Amazon Neptune or ArangoDB or even implementing
triplestores in traditional databases like PostgreSQL or SQLite?

My first thought about this was to create at minimum a single case for
each for our comparisons and expand from there if possible/necessary,
focusing initialy on technologies that have RDFLib support and moving on
from there.

Secondly, when it comes to querying our triplestores, am I correct in
understanding that the query language we will support is SparkQL only?
As in the user will query in SparkQL and not any native query language
like Gremlin or GraphQL.

Thirdly, in terms of the project structure, the way that I imagined it
is the following: We have an abstraction layer accessible by the user as described. Within we have a TriplestoreType class, which would serve as
an enumeration of all the types of triplestores we support, and a
triplestoreFactory class, which would create the corresponding required triplestore implementation as we define it in a separate implementation class, e.g., RDFLibTriplestore for triplestores that can be implemented
using the rdflib python library, etc. Finally we would have a manager
class that would serve as a unified interface. The methods I think each
implementation should have at minimum are methods to add any number of
triples, query triples, and remove triples as well as execute SPARQL
queries and write and read from files. I am open to and would appreciate
any proposals from you about expanding these methods and would also
appreciate any criticism about my implementation and where it could be
improved.

With regard to the benchmarking, what are the exact specific cases you
are looking for? Simple accesses of the DB, more complex operations?

I assume the criteria will be time and RAM usage.

Fourthly, I have already seen that you would appreciate a weekly or at
least a biweekly schedule. Could you please get more specific about
exactly what you would require in your scheduling, i.e., would you have
a problem with taking some time off for exam season, how quickly you
would like to get to each project goal, etc.? Additionally, I assume
that you would like to see a domain diagram of the project.

Regarding scheduling  I am also hoping to participate in the code in
place program as an instructor. I am 99% certain that it won't cause a
scheduling conflict in terms of our meetings, but I thought I should
mention it in advance as to not surprise you later.

Fifthly?: The GSoC site mentions that we should ask for any special
requirements about crafting our application; would you happen to have
any of those?

I apologize for the large length of the message and thank you in
advance.

Yours sincerely,

Lampros Avouris



----
Λαμβάνετε αυτό το μήνυμα απο την λίστα: Λίστα αλληλογραφίας και
συζητήσεων που απευθύνεται σε φοιτητές developers \& mentors έργων του
Google Summer of Code - A discussion list for student developers and
mentors of Google Summer of Code projects.,
https://lists.ellak.gr/gsoc-developers/listinfo.html
Μπορείτε να απεγγραφείτε από τη λίστα στέλνοντας κενό μήνυμα ηλ.
ταχυδρομείου στη διεύθυνση <gsoc-developers+unsubscribe [ at ] ellak [ dot ] gr>.

----
Λαμβάνετε αυτό το μήνυμα απο την λίστα: Λίστα αλληλογραφίας και
συζητήσεων που απευθύνεται σε φοιτητές developers \& mentors έργων του
Google Summer of Code - A discussion list for student developers and
mentors of Google Summer of Code projects.,
https://lists.ellak.gr/gsoc-developers/listinfo.html
Μπορείτε να απεγγραφείτε από τη λίστα στέλνοντας κενό μήνυμα ηλ.
ταχυδρομείου στη διεύθυνση <gsoc-developers+unsubscribe [ at ] ellak [ dot ] gr>.
Thank you very much for the prompt response.

1) With regards to issue 1,taking note of your clarification , I think it would be better to stick to technologies like Apache Jena or RDF4J and move on from there without touching on the subject of implementing RDFs on other types of databases, at least initially.

2) As for the issue of the implementation of query languages like Gremlin, it was only an issue with regards to adapting graph databases that use them to store triples and thusly won't be necessary for the project, at least initially.

3) Ok, I will review that to refine my approach.

4) Thank you for the clarifications. I assume, of course, that the plan must follow the timing guidelines that GSOC has posted.

5) I would very much appreciate this if it would not be much trouble. Any additional resources I should look into would also be most welcome.

Thank you very much for your time.

Lampros Avouris
----
Λαμβάνετε αυτό το μήνυμα απο την λίστα: Λίστα αλληλογραφίας και συζητήσεων που απευθύνεται σε φοιτητές developers \& mentors έργων του Google Summer of Code - A discussion list for student developers and mentors of Google Summer of Code projects.,
https://lists.ellak.gr/gsoc-developers/listinfo.html
Μπορείτε να απεγγραφείτε από τη λίστα στέλνοντας κενό μήνυμα ηλ. ταχυδρομείου στη διεύθυνση <gsoc-developers+unsubscribe [ at ] ellak [ dot ] gr>.

πλοήγηση μηνυμάτων