MSc Thesis: Weather Talk - extracting weather information by text mining
Vasileios Lamposadvised by Prof. Nello Cristianini
|
Weather Talk MSc Thesis |
Weather Talk Visualisation Tool By using this visualisation tool, one is able to compare the official with the inferred weather map in all the investigated schemes for 120 days. |
Weather Talk Poster |
Part of the abstract
The main aim of this project was to design and implement a system able to infer the weather state of a location for a specific date by applying Bayesian inference models and statistical analysis on web observations. Additionally, we investigated various linear combinations of probabilistic schemes where traffic information, previous day's weather or a weather prior probability contribute to the final decision. As a final extension, we visualised the weather inference results on a map.
Software packages and a weather ontology were developed for data collection and preprocessing. Parameterised Bayesian belief networks formed the expression of probabilistic correlation between the inferred and the official weather observations. During training, we decide the optimal parameters and then test their absolute and relative performance. Experimental results indicate that the absolute and relative (p-values) performance in most of the schemes is significant. As a result, one may assume that similar or even more sophisticated information extraction models on different contexts will be able to deliver useful conclusions.
Part of the introduction
The aim of this project is to apply statistical analysis on web observations and to design
Bayesian inference schemes in order to extract information about the weather state of
a location during a specific date. Documents may include blogs, newsgroups, and news
articles but not officially weather related sources. A further challenge is the implementation
of a data fusion model able to combine traffic information with weather observations and
achieve a better performance.
Part of the conclusions
Weather talk
forms a web mining framework with an ontology embedded that bases its
decisions on Bayesian theory. In a period of three months, without the needed computational
power, and with all the limitations that we have mentioned in this chapter, we achieved to
infer the two major weather states of a location with 63.51% of success. As a result, the
most important outcome of this project is that this kind of information extraction is possible
and now it should be focused on different contexts.


