Thanks for your interest in the project, Lampro.
Quick replies:
1. the goal of the project is to provide an abstraction layer
to existing triplestores. You could implement a new one
of your own for testing purposes or for benchmark baseline,
but don't plan to spend too much time on this task,
as it's not part of the goal.
2. SPARQL (not SparkQL) is the standard query language
for RDF data, so it makes sense to be the common denominator
that all triplestores implement and handle.
If there are other way of interacting with specific backends,
it will be valuable to also allow them for the user.
3. I've written in a previous reply the very basic structure
of the implementation, so I won't repeat it here.
Regarding benchmarking, we will have a common scenario
to run with different backends: create a database, load this data,
perform these queries, etc.
4. Regarding scheduling, I expect to see a weekly plan
of work, results and milestones.
Keep in mind that:
- work starts on June 2nd; and
- this project has been scoped as requiring 350 hours of work.
It's up to you to consider all your other commitments
(time off, exams, other work, ...) and come up with a plan
of how much work will be done at every stage.
Most GSoC projects are on a 12-week scale (3 months),
but this might be too ambitious for this one,
so feel free to extend it according to your planning.
You will be expected to follow the plan you submit
(after some adjustments that might be made before work starts).
5. I can't think of any "special requirements".
I might post a message about what I expect to see in a proposal.
Oh, and a clarification about using "AI"...
On Sun, Mar 23, 2025, at 14:13, ΑΒΟΥΡΗΣ ΛΑΜΠΡΟΣ wrote:
To whom it may concern, and especially to Dr. Alexios Zavras,
Hello, my name is Lampros Avouris. I am a fourth-year electrical
engineering and computer technologies student at the University of
Patras.
I am looking to participate in the 2025 GSOC, and the Exploring and
Abstracting Triplestore Alternatives project has especially piqued my
interest.
I have looked into the project for quite a bit and have some questions
so that I am able to create the best possible project proposal.
Firstly: When referring to Triplestore alternatives, are we talking
exclusively about systems with native RDF triple store support like
Apache Jena, Stardog, etc., or will it be useful to explore the
implementation of triplestores in alternative technologies not
purpose-built for triplestores, like adapting graph
databases/multimodal
databases like Amazon Neptune or ArangoDB or even implementing
triplestores in traditional databases like PostgreSQL or SQLite?
My first thought about this was to create at minimum a single case for
each for our comparisons and expand from there if possible/necessary,
focusing initialy on technologies that have RDFLib support and moving
on
from there.
Secondly, when it comes to querying our triplestores, am I correct in
understanding that the query language we will support is SparkQL only?
As in the user will query in SparkQL and not any native query language
like Gremlin or GraphQL.
Thirdly, in terms of the project structure, the way that I imagined it
is the following: We have an abstraction layer accessible by the user
as
described. Within we have a TriplestoreType class, which would serve
as
an enumeration of all the types of triplestores we support, and a
triplestoreFactory class, which would create the corresponding
required
triplestore implementation as we define it in a separate
implementation
class, e.g., RDFLibTriplestore for triplestores that can be
implemented
using the rdflib python library, etc. Finally we would have a manager
class that would serve as a unified interface. The methods I think
each
implementation should have at minimum are methods to add any number of
triples, query triples, and remove triples as well as execute SPARQL
queries and write and read from files. I am open to and would
appreciate
any proposals from you about expanding these methods and would also
appreciate any criticism about my implementation and where it could be
improved.
With regard to the benchmarking, what are the exact specific cases you
are looking for? Simple accesses of the DB, more complex operations?
I assume the criteria will be time and RAM usage.
Fourthly, I have already seen that you would appreciate a weekly or at
least a biweekly schedule. Could you please get more specific about
exactly what you would require in your scheduling, i.e., would you
have
a problem with taking some time off for exam season, how quickly you
would like to get to each project goal, etc.? Additionally, I assume
that you would like to see a domain diagram of the project.
Regarding scheduling I am also hoping to participate in the code in
place program as an instructor. I am 99% certain that it won't cause a
scheduling conflict in terms of our meetings, but I thought I should
mention it in advance as to not surprise you later.
Fifthly?: The GSoC site mentions that we should ask for any special
requirements about crafting our application; would you happen to have
any of those?
I apologize for the large length of the message and thank you in
advance.
Yours sincerely,
Lampros Avouris
----
Λαμβάνετε αυτό το μήνυμα απο την λίστα: Λίστα αλληλογραφίας και
συζητήσεων που απευθύνεται σε φοιτητές developers \& mentors έργων του
Google Summer of Code - A discussion list for student developers and
mentors of Google Summer of Code projects.,
https://lists.ellak.gr/gsoc-developers/listinfo.html
Μπορείτε να απεγγραφείτε από τη λίστα στέλνοντας κενό μήνυμα ηλ.
ταχυδρομείου στη διεύθυνση <gsoc-developers+unsubscribe [ at ] ellak [ dot ] gr>.
----
Λαμβάνετε αυτό το μήνυμα απο την λίστα: Λίστα αλληλογραφίας και
συζητήσεων που απευθύνεται σε φοιτητές developers \& mentors έργων του
Google Summer of Code - A discussion list for student developers and
mentors of Google Summer of Code projects.,
https://lists.ellak.gr/gsoc-developers/listinfo.html
Μπορείτε να απεγγραφείτε από τη λίστα στέλνοντας κενό μήνυμα ηλ.
ταχυδρομείου στη διεύθυνση <gsoc-developers+unsubscribe [ at ] ellak [ dot ] gr>.