Research Area:  Data Mining
Event Mining is the set of information extraction tasks that aim to extract events from text identifying the what (action or event category), the who (actors and targets), the when (date), and the where (location). Extraction of events requires a number of automated steps for recognizing all of these components. Current state-of-the-art Event Extraction systems rely on batch learning,but analysts need near-real time sociology-political conflicts understanding. Therefore, a key research question is how to deal with Event Mining in near-real-time scenarios.
In this thesis, a novel framework was developed to deal with Event Extraction, Event Detection, Event Classification, and Argument Classification, using online learning and pre quential testing to work in near-real time scenarios. The framework was tested using three different social science datasets of the Afghanistan-Pakistan conflict using events reported by news, social media and local experts (ACLED). A novel method called SPLICER was built to tackle these tasksin real time, using Stream Mining models in a multi-layered constraint learning approach.
In addition to improvements over Stream Mining algorithms, this thesis addresses the question of how to automatically combine multiple sources of information for stream classification.We propose SLICER, a Stream Mining ensemble to handle stream partitioning automatically. It assesses when and how it is better to ?horizontally? split a stream dataset to build multiple lo-cal models to boost global models. SLICER was tested under single layer Event Mining tasks,showing better results than single Stream Mining baselines and classic Stream Mining ensembles.
Finally, SPLICER and SLICER were jointly used, improving results from 3 to 5% for event extraction tasks. In conclusion, Stream Mining algorithms can be efficiently used for Event Mining and other information extraction tasks, if horizontal partitions are carefully made by using Information Gain or Gini measures to split the source into multiple streams.
Name of the Researcher:  John Calvo Martinez
Name of the Supervisor(s):  Wayne Wobcke
Year of Completion:  2019
University:  The University of New South Wales
Thesis Link:   Home Page Url