I build, (sometimes with the help of very proficient people) software prototypes for machine learning applications, mostly to solve Information Retrieval, Text Mining and Natural Language Processing tasks. The software’s described in this page were presented in various national and international scientific evaluation campaigns.

I code mostly in Perl and Java and use SVM Lib, BoosTexter, CRF++ and Weka for machine learning purposes.

Evaluation Campaigns

DEFT – Text Mining
2013 Collaborative tagging 1st system (task1)
2008 Classification 1st system
2007 Opinion mining 2d system (student)
TAC-KBP – Information extraction and retrieval
2014 Entity Linking 10 th system
2013 Entity Linking 13th overall  / 3 rd no-wiki 
CoNLL Shared Task – Natural Language Processing
2011 Co-Reference 9th system (after corrections on metrics decided by the revision committee on dec 2013/ updated results to appear on website and in a future paper, see metrics updates and news here )
Ester 2 – Information extraction
2008 Named Entities 1st system (3 tasks / 4)


  • Text Miner for DEFT 2013 : The DEFT challenge is an annual French-speaking text mining evaluation challenge. This 9th edition focused on the automatic analysis of recipes in French. This system obtained best results on task 1 and 2d best results on task 2 of the  DEFT 2013 campaign.
  • SemLinker : a system built for the NIST-TAC KBP 2013 evaluation campaign. SemLinker is an experiment platform intended to study and solve various aspect of semantic annotation. An improved version developed by the CSFG team was deployed for the NIST 2014 eval.
  • NLGbAse : NLGbAse is an architecture to product Metadatas and Components devoted to Natural Language Processing and semantic analysis and labeling  tasks. NLGbAse transforms encyclopedic text contents into structured knowledge fully integrated with the LinkedData network and the Semantic Web.
  • Poly-co  co-references solver. This system integrates a multilayer perceptron classifier in a pipeline approach. Some heuristics are used to select the pairs of coreference candidates fed to the network for training, and our feature selection method. The features used in our approach are based on similarity and identity measures, filtering information, like gender and number, and other syntactic information.  Evaluated in ConLL 2011 Shared Task.

Deprecated  or not available anymore:

  • Wikimeta : The Wikimeta platform was the achievement of 4 years of research in the field of machine learning, information extraction, natural language processing and semantic annotation. It provides an high quality information extraction engine, including high level text-mining with unique functionality. Performances of Wikimeta are evaluated on standard corpora, and in scientific evaluation campaign with state of the art metrics. The Named Entity recognition module of Wikimeta is derived of the one used in the ESTER 2 evaluation campaign  (LIA Team) for the Named Entity Recognition task and obtained the best overall performances. Wikimeta life cycle is ended since the TAC 2014 KBP evaluation campaign, where it was used for the last time.

Leave a comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s