Dear Sirs
Thank you very much for your responses.
I have submitted a draft proposal incase you get the time to have a look
at it. I understand it is short notice but I would appreciate any
recommendations before I submit my final proposal on Tuesday morning.
Best regards
Ioakeim James Theologou
Στις 2018-03-24 22:15, Theodoros G. Karounos έγραψε:
Dear Ioakim,
take a look at this project(
https://ellak.gr/wiki/index.php?title=GSOC2018_Projects#Adding_Greek_language_on_NLP_library_Spacy.io
[8] ), it's results may be of interest to your project.
T.K.
2018-03-24 22:04 GMT+02:00 Iraklis Varlamis <varlamis [ at ] gmail [ dot ] com>:
Dear Ioakim,
As Dr. Karounos wrote Python provides some helpful libraries both
for machine learning (scikit learn) as well as for text processing
and nlp (e.g. nltk). Definitely java can be used in place.
Candidate references to Greek government entities (these can be the
named entities, e.g. General Secretariat of ..., or Mayor of ...) in
the text can be found either using regex and machine learning (once
training samples can be found) and the same holds for the assigned
responsibilities.
More details can be discussed once the project begins.
Stanford's CoreNLP https://stanfordnlp.github.io/CoreNLP/ [1] and
Apache's OpenNLP https://opennlp.apache.org/ [2] are the two tools
to check if you are going to work with Java.
Iraklis
On Sat, Mar 24, 2018 at 6:20 PM, Theodoros G. Karounos
<t [ dot ] karounos [ at ] gmail [ dot ] com> wrote:
Please find my answers in-line.
2018-03-24 11:08 GMT+02:00 ioaktheo <ioaktheo [ at ] teiser [ dot ] gr>:
Dear Sirs,
I am writing this email to you with regards to my interest in the
project named «Extraction of Responsibilities per unit in public
sector organizations from the Government Gazette». Having read
through the details of the project I would like to ask some
questions so that I can understand better the requirements. I would
be very grateful if you have the time to answer these questions
before I submit my proposal.
First, I see that the knowledge prerequisites include Python, Java
and Machine Learning. I’m more familiar to Java, Machine learning
and Data mining. I haven’t worked with Python, but I am willing to
sit and work with this language before Google Summer of Code
starts. Is Python going to be used for Machine learning purposes?
_PYTHON IS PREFERRED FOR MACHINE LEARNING BUT JAVA DOES THE JOB AS
WELL._
Secondly, am I right in understanding that Machine Learning is used
to automatically find and match «specific Named Entities types with
references to assigned responsibilities-services per unit and links
between the two must be extracted» is one of the main issues of
this project?
_YES ONE OF THE MAIN TASKS OF THIS PROJECT IS FROM THE TEXT IN THE
PDF'S OF THE LAW THAT DEFINE THE GOVERNANCE OF A GREEK
GOVERNMENT ENTITIES YOU SHOULD EXTRACTED IN HIERARCHICAL ORDER
THE ASSIGNED RESPONSIBILITIES-SERVICES FOR EACH UNIT OF THAT
INSTITUTION. _
If so, am I right in thinking the steps required include:
Preprocessing the data, Data integration, Hierarchical or
partitioned clustering, Categorization and correlation rules?
_YES THIS IS THE APPROACH IN A FEW WORDS, YOU SHOULD EXPAND IT IN
YOUR PROJECT. BUT WE WILL DISCUSS THIS EXTENSIVELY WITH ALL THE
MENTORS(
HTTPS://ELLAK.GR/WIKI/INDEX.PHP?TITLE=GSOC2018_PROJECTS#EXTRACTION_OF_RESPONSIBILITIES_PER_UNIT_IN_PUBLIC_SECTOR_ORGANIZATIONS_FROM_THE_GOVERNMENT_GAZETTE
[3] ) ONCE WE HAVE THE PROJECT_ approved.
Finally, I am bit confused about the NER module. Is there any more
information on this subject?
_PLEASE READ THIS( HTTPS://NLP.STANFORD.EDU/SOFTWARE/CRF-NER.HTML
[4] ), THERE ARE PLENTY MORE RESOURCES, SEARCH GOOGLE SCHOLAR(
HTTPS://SCHOLAR.GOOGLE.GR/SCHOLAR?HL=EL&AS_SDT=0%2C5&Q=NAMED+ENTITY+RECOGNIZER&BTNG=
[5] ), ETC... _
Thank you in advance.
Best regards
Ioakeim
----
Λαμβάνετε αυτό το μήνυμα απο την
λίστα: Γενική λίστα αλληλογραφίας
που απευθύνεται σε developers/contributors
έργων ανοικτού λογισμικού - A general
discussion list for developers/contributors of open-source projects,
https://lists.ellak.gr/opensource-devs/listinfo.html [6]
Μπορείτε να απεγγραφείτε από τη
λίστα στέλνοντας κενό μήνυμα ηλ.
ταχυδρομείου στη διεύθυνση
<opensource-devs+unsubscribe [ at ] ellak [ dot ] gr>.
--
Jiddu Krishnamurti: If we can really understand the problem, the
answer will come out of it, because the answer is not separate from
the problem.
http://karounos.gr/blog/ [7], Key-ID: 85AE3458
----
Λαμβάνετε αυτό το μήνυμα απο την
λίστα: Γενική λίστα αλληλογραφίας
που απευθύνεται σε developers/contributors
έργων ανοικτού λογισμικού - A general
discussion list for developers/contributors of open-source projects,
https://lists.ellak.gr/opensource-devs/listinfo.html [6]
Μπορείτε να απεγγραφείτε από τη
λίστα στέλνοντας κενό μήνυμα ηλ.
ταχυδρομείου στη διεύθυνση
<opensource-devs+unsubscribe [ at ] ellak [ dot ] gr>.
--
Jiddu Krishnamurti: If we can really understand the problem, the
answer will come out of it, because the answer is not separate from
the problem.
http://karounos.gr/blog/ [7], Key-ID: 85AE3458
Links:
------
[1] https://stanfordnlp.github.io/CoreNLP/
[2] https://opennlp.apache.org/
[3]
https://ellak.gr/wiki/index.php?title=GSOC2018_Projects#Extraction_of_Responsibilities_per_unit_in_public_sector_organizations_from_the_Government_Gazette
[4] https://nlp.stanford.edu/software/CRF-NER.html
[5]
https://scholar.google.gr/scholar?hl=el&as_sdt=0%2C5&q=Named+Entity+Recognizer&btnG=
[6] https://lists.ellak.gr/opensource-devs/listinfo.html
[7] http://karounos.gr/blog/
[8]
https://ellak.gr/wiki/index.php?title=GSOC2018_Projects#Adding_Greek_language_on_NLP_library_Spacy.io
----
Λαμβάνετε αυτό το μήνυμα απο την λίστα: Γενική λίστα αλληλογραφίας που
απευθύνεται σε developers/contributors έργων ανοικτού λογισμικού - A
general discussion list for developers/contributors of open-source
projects,
https://lists.ellak.gr/opensource-devs/listinfo.html
Μπορείτε να απεγγραφείτε από τη λίστα στέλνοντας κενό μήνυμα ηλ.
ταχυδρομείου στη διεύθυνση <opensource-devs+unsubscribe [ at ] ellak [ dot ] gr>.
----
Λαμβάνετε αυτό το μήνυμα απο την λίστα: Γενική λίστα αλληλογραφίας που απευθύνεται σε developers/contributors έργων ανοικτού λογισμικού - A general discussion list for developers/contributors of open-source projects,
https://lists.ellak.gr/opensource-devs/listinfo.html
Μπορείτε να απεγγραφείτε από τη λίστα στέλνοντας κενό μήνυμα ηλ. ταχυδρομείου στη διεύθυνση <opensource-devs+unsubscribe [ at ] ellak [ dot ] gr>.