I build, (sometimes with the help of very proficient people) software prototypes for machine learning applications, mostly to solve Information Retrieval, Text Mining and Natural Language Processing tasks. The software’s described in this page were presented in various national and international scientific evaluation campaigns.
I code mostly in Perl and Java and use SVM Lib, BoosTexter, CRF++ and Weka for machine learning purposes.
CoNLL Shared Task – Natural Language Processing
||9th system (after corrections on metrics decided by the revision committee on dec 2013/ updated results to appear on website and in a future paper, see metrics updates and news here )
Ester 2 – Information extraction
||1st system (3 tasks / 4)
- Text Miner for DEFT 2013 : The DEFT challenge is an annual French-speaking text mining evaluation challenge. This 9th edition focused on the automatic analysis of recipes in French. This system obtained best results on task 1 and 2d best results on task 2 of the DEFT 2013 campaign.
- SemLinker : a system built for the NIST-TAC KBP 2013 evaluation campaign. SemLinker is an experiment platform intended to study and solve various aspect of semantic annotation. An improved version developed by the CSFG team was deployed for the NIST 2014 eval.
- NLGbAse : NLGbAse is an architecture to product Metadatas and Components devoted to Natural Language Processing and semantic analysis and labeling tasks. NLGbAse transforms encyclopedic text contents into structured knowledge fully integrated with the LinkedData network and the Semantic Web.
- Poly-co co-references solver. This system integrates a multilayer perceptron classiﬁer in a pipeline approach. Some heuristics are used to select the pairs of coreference candidates fed to the network for training, and our feature selection method. The features used in our approach are based on similarity and identity measures, ﬁltering information, like gender and number, and other syntactic information. Evaluated in ConLL 2011 Shared Task.
Deprecated or not available anymore:
- Wikimeta : The Wikimeta platform was the achievement of 4 years of research in the field of machine learning, information extraction, natural language processing and semantic annotation. It provides an high quality information extraction engine, including high level text-mining with unique functionality. Performances of Wikimeta are evaluated on standard corpora, and in scientific evaluation campaign with state of the art metrics. The Named Entity recognition module of Wikimeta is derived of the one used in the ESTER 2 evaluation campaign (LIA Team) for the Named Entity Recognition task and obtained the best overall performances. Wikimeta life cycle is ended since the TAC 2014 KBP evaluation campaign, where it was used for the last time.