Dear Alexios, Over the past few days I have been studying the SPDX v3 specification in more depth in order to better understand its data model and overall structure. While doing so, a few questions came up that I would greatly appreciate your thoughts on. First, regarding Profiles: since SPDX v3 is modular (Core, Software, Security, Licensing, AI, etc.), should the project aim to support all profiles uniformly, or would it be more reasonable to initially focus on a subset (e.g., Core + Software + Licensing) and treat the rest as extensions? I am trying to understand whether profile coverage should influence the abstraction layer design from the beginning, especially since each profile introduces different classes and properties. Second, I have started experimenting with small demo SBOMs in RDF and attempted to merge them into a shared graph. As expected, the main challenge appears to be identity resolution and element reuse. In my prototype, I implemented a simple canonicalization strategy where packages are considered identical based on (name, version). This allowed successful reuse of shared dependencies across SBOMs. However, I am aware that in practice stronger identifiers may be preferable (e.g., purl, CPE, hash, externalIdentifier, etc.), and that fallback strategies may be required when some fields are missing. In this context, I would like to ask: should the library itself define clear rules for when two elements are considered the “same” (for example, specifying which fields determine identity for Packages, Agents, Files, etc.), or should it mainly provide the technical mechanism for merging and leave the actual identity criteria configurable by the user? I am asking this in order to better understand the expected scope and responsibilities of the project, so that I can structure a strong and well-aligned proposal. Thank you very much for your time and guidance. I look forward to your feedback and any suggestions you may have on how to approach this aspect of the project. Best regards, Maira Papadopoulou ---------- Forwarded message --------- Από: Alexios Zavras <zvr+eellak [ at ] zvr [ dot ] gr> Date: Πέμ 26 Φεβ 2026 στις 3:41 μ.μ. Subject: Re: [gsoc-developers] Interest and some Questions about "Unified SBOM Management via RDF Database Abstraction" Project To: Maira Papadopoulou <mmpapadopoulouu [ at ] gmail [ dot ] com>, < gsoc-developers [ at ] ellak [ dot ] gr> Hi Maira, thanks for your interest. I believe your email accurately captures the essence of the project. A couple of notes: - focus on SPDXv3 (rather than v2): the model is completely based on graph data, rather than on documents. - on the other hand, there is an abundance of data in SPDXv2 format and only a few in SPDXv3. So, handling v2 will be required for real-data testing. - I think the major point of the project (and the difficulty) will not be the store-SBOM / reproduce-SBOM functionality. It will be the combination of data from different SBOMs and the reuse of elements common in them, which should happen transparently. Let me know if you have more questions. On Mon, Feb 23, 2026, at 16:50, Maira Papadopoulou wrote: > Dear Alexios Zavras, > > I hope this email finds you well. > > My name is Maira Papadopoulou, and I am a fourth-year undergraduate > student at the Department of Informatics and Telecommunications of the > National and Kapodistrian University of Athens. I recently reviewed the > "Unified SBOM Management via RDF Database Abstraction" project for > Google Summer of Code 2026, and I am very interested in contributing to > it. > > Having worked extensively with the triplestore abstraction library in > the past, I am particularly motivated by the idea of continuing and > expanding this effort. Through my previous experience developing the > triplestore abstraction library, I gained practical exposure to RDF > modeling, SPARQL querying, backend abstraction design, and overcoming > the challenges of building a backend-agnostic library. This has made me > comfortable working both with semantic data models and with the > architectural aspects required to design clean and extensible libraries. > > Although I did not have much prior experience with SBOM data or the > SPDX standard, I have recently started exploring both the v2 and v3 > specifications. What I find particularly interesting about SPDX 3.0 is > its graph-oriented design, since its ontology-based structure maps > naturally to RDF concepts and triplestore storage. From what I > currently understand, the project is not just about converting formats. > Rather, it focuses on properly ingesting SPDX-based SBOM data into an > RDF store, managing it in a structured way, and being able to > reconstruct valid SPDX documents from the stored graph using the > abstraction layer. > > To effectively contribute to the project, I was thinking of approaching > it in a structured way, while remaining flexible depending on the > architectural direction you have in mind. > > Initially, I would like to focus on understanding the SPDX data model > in depth and how its ontology maps to RDF structures in practice. Once > I feel confident with the model, I would start by experimenting with > basic CRUD operations for SBOM data in the triplestore abstraction > layer. I believe this would help me better understand how SPDX entities > behave once stored as triples and what design considerations might > emerge early on. After that, I would likely move to the Store-to-SBOM > Exporter. Working from the export side seems like a good way to > validate that the stored graph structure is sufficiently expressive to > reconstruct standard-compliant SPDX documents. From there, I would > implement the reverse flow (SBOM-to-Store ingestion), ensuring that the > mapping from SPDX documents to RDF graphs preserves the intended > semantics and relationships. Finally, I would focus on strengthening > the workflow with testing, validation, and documentation so that > ingest, manage, and export operations work consistently across > supported backends. > > I would really appreciate your thoughts on whether this direction is > sound. In particular, from your perspective, do you see the main > technical challenge lying in the RDF modeling of SPDX 3.0 itself, > meaning I should focus more deeply on understanding the specification > and ontology, or rather in the integration between SPDX data and the > triplestore abstraction layer? > > I would love to discuss the project in more detail and better > understand the architectural goals you have in mind. If there are > specific technical aspects you would recommend prioritizing at this > stage, I would be happy to explore them further. > > Looking forward to your response. > > Best regards, > Maira Papadopoulou > ---- > Λαμβάνετε αυτό το μήνυμα απο την λίστα: Λίστα αλληλογραφίας και > συζητήσεων που απευθύνεται σε φοιτητές developers \& mentors έργων του > Google Summer of Code - A discussion list for student developers and > mentors of Google Summer of Code projects., > https://lists.ellak.gr/gsoc-developers/listinfo.html > Μπορείτε να απεγγραφείτε από τη λίστα στέλνοντας κενό μήνυμα ηλ. > ταχυδρομείου στη διεύθυνση <gsoc-developers+unsubscribe [ at ] ellak [ dot ] gr > <mailto:gsoc-developers%2Bunsubscribe [ at ] ellak [ dot ] gr>>. -- -- zvr -
---- Λαμβάνετε αυτό το μήνυμα απο την λίστα: Λίστα αλληλογραφίας και συζητήσεων που απευθύνεται σε φοιτητές developers \& mentors έργων του Google Summer of Code - A discussion list for student developers and mentors of Google Summer of Code projects., https://lists.ellak.gr/gsoc-developers/listinfo.html Μπορείτε να απεγγραφείτε από τη λίστα στέλνοντας κενό μήνυμα ηλ. ταχυδρομείου στη διεύθυνση <gsoc-developers+unsubscribe [ at ] ellak [ dot ] gr>.