Dear Ioakim, take a look at this project( https://ellak.gr/wiki/index.php?title=GSOC2018_Projects#Adding_Greek_language_on_NLP_library_Spacy.io ), it's results may be of interest to your project. T.K. 2018-03-24 22:04 GMT+02:00 Iraklis Varlamis <varlamis [ at ] gmail [ dot ] com>: > Dear Ioakim, > As Dr. Karounos wrote Python provides some helpful libraries both for > machine learning (scikit learn) as well as for text processing and nlp > (e.g. nltk). Definitely java can be used in place. > Candidate references to Greek government entities (these can be the named > entities, e.g. General Secretariat of ..., or Mayor of ...) in the text can > be found either using regex and machine learning (once training samples can > be found) and the same holds for the assigned responsibilities. > More details can be discussed once the project begins. > Stanford's CoreNLP https://stanfordnlp.github.io/CoreNLP/ and Apache's > OpenNLP https://opennlp.apache.org/ are the two tools to check if you are > going to work with Java. > > Iraklis > > > > On Sat, Mar 24, 2018 at 6:20 PM, Theodoros G. Karounos < > t [ dot ] karounos [ at ] gmail [ dot ] com> wrote: > >> Please find my answers in-line. >> >> 2018-03-24 11:08 GMT+02:00 ioaktheo <ioaktheo [ at ] teiser [ dot ] gr>: >> >>> Dear Sirs, >>> >>> I am writing this email to you with regards to my interest in the >>> project named «Extraction of Responsibilities per unit in public sector >>> organizations from the Government Gazette». Having read through the details >>> of the project I would like to ask some questions so that I can understand >>> better the requirements. I would be very grateful if you have the time to >>> answer these questions before I submit my proposal. >>> >>> First, I see that the knowledge prerequisites include Python, Java and >>> Machine Learning. I’m more familiar to Java, Machine learning and Data >>> mining. I haven’t worked with Python, but I am willing to sit and work with >>> this language before Google Summer of Code starts. Is Python going to be >>> used for Machine learning purposes? >>> >> *Python is preferred for machine learning but JAVA does the job as well.* >> >>> Secondly, am I right in understanding that Machine Learning is used to >>> automatically find and match «specific Named Entities types with references >>> to assigned responsibilities-services per unit and links between the two >>> must be extracted» is one of the main issues of this project? >>> >> *Yes one of the main tasks of this project is from the text in the PDF's >> of the law that define the governance of a Greek government entities you >> should extracted in hierarchical order the assigned >> responsibilities-services for each unit of that institution. * >> >>> >>> If so, am I right in thinking the steps required include: Preprocessing >>> the data, Data integration, Hierarchical or partitioned clustering, >>> Categorization and correlation rules? >>> >> >> *Yes this is the approach in a few words, you should expand it in your >> project. But we will discuss this extensively with all the mentors( >> https://ellak.gr/wiki/index.php?title=GSOC2018_Projects#Extraction_of_Responsibilities_per_unit_in_public_sector_organizations_from_the_Government_Gazette >> <https://ellak.gr/wiki/index.php?title=GSOC2018_Projects#Extraction_of_Responsibilities_per_unit_in_public_sector_organizations_from_the_Government_Gazette> >> ) once we have the project approved.* >> >>> >>> Finally, I am bit confused about the NER module. Is there any more >>> information on this subject? >>> >> *Please read this( https://nlp.stanford.edu/software/CRF-NER.html >> <https://nlp.stanford.edu/software/CRF-NER.html> ), there are plenty more >> resources, search Google Scholar( >> https://scholar.google.gr/scholar?hl=el&as_sdt=0%2C5&q=Named+Entity+Recognizer&btnG= >> <https://scholar.google.gr/scholar?hl=el&as_sdt=0%2C5&q=Named+Entity+Recognizer&btnG=> >> ), etc... * >> >> >>> >>> Thank you in advance. >>> Best regards >>> Ioakeim >>> >>> >>> ---- >>> Λαμβάνετε αυτό το μήνυμα απο την λίστα: Γενική λίστα αλληλογραφίας που >>> απευθύνεται σε developers/contributors έργων ανοικτού λογισμικού - A >>> general discussion list for developers/contributors of open-source projects, >>> https://lists.ellak.gr/opensource-devs/listinfo.html >>> >>> Μπορείτε να απεγγραφείτε από τη λίστα στέλνοντας κενό μήνυμα ηλ. >>> ταχυδρομείου στη διεύθυνση <opensource-devs+unsubscribe [ at ] ellak [ dot ] gr>. >>> >>> >> >> >> -- >> Jiddu Krishnamurti: If we can really understand the problem, the answer >> will come out of it, because the answer is not separate from the problem. >> >> http://karounos.gr/blog/, Key-ID: 85AE3458 >> >> >> ---- >> Λαμβάνετε αυτό το μήνυμα απο την λίστα: Γενική λίστα αλληλογραφίας που >> απευθύνεται σε developers/contributors έργων ανοικτού λογισμικού - A >> general discussion list for developers/contributors of open-source projects, >> https://lists.ellak.gr/opensource-devs/listinfo.html >> >> Μπορείτε να απεγγραφείτε από τη λίστα στέλνοντας κενό μήνυμα ηλ. >> ταχυδρομείου στη διεύθυνση <opensource-devs+unsubscribe [ at ] ellak [ dot ] gr>. >> >> > -- Jiddu Krishnamurti: If we can really understand the problem, the answer will come out of it, because the answer is not separate from the problem. http://karounos.gr/blog/, Key-ID: 85AE3458
---- Λαμβάνετε αυτό το μήνυμα απο την λίστα: Γενική λίστα αλληλογραφίας που απευθύνεται σε developers/contributors έργων ανοικτού λογισμικού - A general discussion list for developers/contributors of open-source projects, https://lists.ellak.gr/opensource-devs/listinfo.html Μπορείτε να απεγγραφείτε από τη λίστα στέλνοντας κενό μήνυμα ηλ. ταχυδρομείου στη διεύθυνση <opensource-devs+unsubscribe [ at ] ellak [ dot ] gr>.