Datenbank- und Informationssysteme (Universität Paderborn)

Zeitschriftenbeiträge: Frank Brüseke and Henning Wachsmuth and Gregor Engels and Steffen Becker: PBlaman: Performance Blame Analysis based on Palladio Contracts. In Concurrency and Computation Practice and Experience, vol. 26, no. 12, pp. 1975--2004 (2014)
Show Bibtex | Show Abstract | DOI
@article{brueseke2013b, author = {Frank Brüseke and Henning Wachsmuth and Gregor Engels and Steffen Becker}, title = {PBlaman: Performance Blame Analysis based on Palladio Contracts}, journal = {Concurrency and Computation Practice and Experience}, year = {2014}, volume = {26}, number = {12}, pages = {1975--2004} }
In performance-driven software engineering, the performance of a system is evaluated through models before the system is assembled. After assembly, the performance is then validated using performance tests. When a component-based system fails certain performance requirements during the tests, it is important to find out whether individual components yield performance errors or whether the composition of components is faulty. This task is called performance blame analysis. Existing performance blame analysis approaches and also alternative error analysis approaches are restricted, because they either do not employ expected values, use expected values from regression testing, or use static developer-set limits. In contrast, this paper describes the new performance blame analysis approach PBlaman that builds upon our previous work and that employs the context-portable performance contracts of Palladio. PBlaman decides what components to blame by comparing the observed response time data series of each single component operation in a failed test case to the operation's expected response time data series derived from the contracts. System architects are then assisted by a visual presentation of the obtained analysis results. We exemplify the benefits of PBlaman in two case studies, each of which representing applications that follow a particular architectural style.
Rezensierte Konferenzbeiträge: Henning Wachsmuth and Martin Trenkmann and Benno Stein and Gregor Engels: Modeling Review Argumentation for Robust Sentiment Analysis. In Proceedings of the 25th International Conference on Computational Linguistics. Dublin City University and Association for Computational Linguistics (Dublin, Ireland), pp. 553-564 (2014)
Show Bibtex | Show Abstract
@inproceedings{wachsmuth:2014b, author = {Henning Wachsmuth and Martin Trenkmann and Benno Stein and Gregor Engels}, title = {Modeling Review Argumentation for Robust Sentiment Analysis}, booktitle = {Proceedings of the 25th International Conference on Computational Linguistics}, year = {2014}, pages = {553-564}, address = {Dublin, Ireland}, month = {August}, publisher = {Dublin City University and Association for Computational Linguistics} }
Most text classification approaches model texts at the lexical and syntactic level only, lacking domain robustness and explainability. In tasks like sentiment analysis, such approaches can result in limited effectiveness if the texts consist of a series of arguments. In this paper, we claim that even a shallow model of the argumentation of the texts allows for an effective and more robust classification, while providing intuitive explanations of the classification results. Here, we apply this idea to the statistical prediction of sentiment scores for reviews. We combine existing ideas from sentiment analysis with novel features that compare the overall argumentation structure of a review text to a learned set of common sentiment flow patterns. Our evaluation in two domains demonstrates the benefit of modeling argumentation and its abstract structure for text classification in terms of effectiveness and domain robustness.
Henning Wachsmuth and Martin Trenkmann and Benno Stein and Gregor Engels and Tsvetomira Palakarska: A Review Corpus for Argumentation Analysis. In Proceedings of the 15th International Conference on Computational Linguistics and Intelligent Text Processing. Springer (Kathmandu, Nepal), LNCS, vol. 8404, no. 2, pp. 115-127 (2014)
Show Bibtex | Show Abstract | DOI
@inproceedings{wachsmuth2014a, author = {Henning Wachsmuth and Martin Trenkmann and Benno Stein and Gregor Engels and Tsvetomira Palakarska}, title = {A Review Corpus for Argumentation Analysis}, booktitle = {Proceedings of the 15th International Conference on Computational Linguistics and Intelligent Text Processing}, year = {2014}, volume = {8404}, number = {2}, series = {LNCS}, pages = {115-127}, address = {Kathmandu, Nepal}, month = {April}, publisher = {Springer} }
The analysis of user reviews has become critical in research and industry, as user reviews increasingly impact the reputation of products and services. Many review texts comprise an involved argumentation with facts and opinions on different product features or aspects. Therefore, classifying sentiment polarity does not suffice to capture a review's impact. We claim that an argumentation analysis is needed, including opinion summarization, sentiment score prediction, and others. Since existing language resources to drive such research are missing, we have designed the ArguAna TripAdvisor corpus, which compiles 2,100 manually annotated hotel reviews balanced with respect to the reviews' sentiment scores. Each review text is segmented into facts, positive, and negative opinions, while all hotel aspects and amenities are marked. In this paper, we present the design and a first study of the corpus. We reveal patterns of local sentiment that correlate with sentiment scores, thereby defining a promising starting point for an effective argumentation analysis.
Henning Wachsmuth and Benno Stein and Gregor Engels: Information Extraction as a Filtering Task. In Proceedings of the 22nd ACM Conference on Information and Knowledge Management. ACM (San Francisco, CA, USA), pp. 2049-2058 (2013)
Show Bibtex | Show Abstract | DOI
@inproceedings{wachsmuth:2013c, author = {Henning Wachsmuth and Benno Stein and Gregor Engels}, title = {Information Extraction as a Filtering Task}, booktitle = {Proceedings of the 22nd ACM Conference on Information and Knowledge Management}, year = {2013}, pages = {2049-2058}, address = {San Francisco, CA, USA}, publisher = {ACM} }
Information extraction is usually approached as an annotation task: Input texts run through several analysis steps of an extraction process in which different semantic concepts are annotated and matched against the slots of templates. We argue that such an approach lacks an efficient control of the input of the analysis steps. In this paper, we hence propose and evaluate a model and a formal approach that consistently put the filtering view in the focus: Before spending annotation effort, filter those portions of the input texts that may contain relevant information for filling a template and discard the others. We model all dependencies between the semantic concepts sought for with a truth maintenance system, which in turn infers the portions of text to be annotated in each analysis step. The filtering view enables an information extraction system (1) to annotate only relevant portions of input texts and (2) to easily trade its run-time efficiency for its recall. We provide our approach as an open-source extension of the Apache UIMA framework and we show the potential of our approach in a number of experiments.
Henning Wachsmuth and Benno Stein and Gregor Engels: Learning Efficient Information Extraction on Heterogeneous Texts. In Proceedings of the 6th Internation Joint Conference on Natural Language Processing. AFNLP (Nagoya, Japan), pp. 534-542 (2013)
Show Bibtex | Show Abstract
@inproceedings{wachsmuth:2013b, author = {Henning Wachsmuth and Benno Stein and Gregor Engels}, title = {Learning Efficient Information Extraction on Heterogeneous Texts}, booktitle = {Proceedings of the 6th Internation Joint Conference on Natural Language Processing}, year = {2013}, pages = {534-542}, address = {Nagoya, Japan}, month = {October}, publisher = {AFNLP} }
From an efficiency viewpoint, information extraction means to filter the relevant portions of natural language texts as fast as possible. Given an extraction task, different pipelines of algorithms can be devised that provide the same precision and recall but that vary in their run-time due to different pipeline schedules. While recent research has investigated how to determine the run-time optimal schedule for a collection or a stream of texts, this paper goes one step beyond: we analyze the run-time of efficient schedules as a function of the heterogeneity of texts and we show how this heterogeneity is characterized from a data perspective. For extraction tasks on heterogeneous big data, we present a self-supervised online adaptation approach that learns to predict the optimal schedule depending on the input text. Our evaluation suggests that the approach will significantly improve efficiency on collections and streams of texts of high heterogeneity.
Henning Wachsmuth and Mirko Rose and Gregor Engels: Automatic Pipeline Construction for Real-Time Annotation. In Proceedings of the 14th International Conference on Intelligent Text Processing and Computational Linguistics. Springer (Samos, Greece), LNCS, vol. 7816, pp. 38-49 (2013)
Show Bibtex | Show Abstract | DOI
@inproceedings{wachsmuth:2013a, author = {Henning Wachsmuth and Mirko Rose and Gregor Engels}, title = {Automatic Pipeline Construction for Real-Time Annotation}, booktitle = {Proceedings of the 14th International Conference on Intelligent Text Processing and Computational Linguistics}, year = {2013}, volume = {7816}, series = {LNCS}, pages = {38-49}, address = {Samos, Greece}, month = {March}, publisher = {Springer} }
Many annotation tasks in computational linguistics are tackled with manually constructed pipelines of algorithms. In real-time tasks where information needs are stated and addressed ad-hoc, however, manual construction is infeasible. This paper presents an artificial intelligence approach to automatically construct annotation pipelines for given information needs and quality prioritizations. Based on an abstract ontological model, we use partial order planning to select a pipeline's algorithms and informed search to obtain an efficient pipeline schedule. We realized the approach as an expert system on top of Apache UIMA, which offers evidence that pipelines can be constructed ad-hoc in near-zero time.
Henning Wachsmuth and Benno Stein: Optimal Scheduling of Information Extraction Algorithms. In Proceedings of the 24th International Conference on Computational Linguistics: Posters. The COLING 2012 Organizing Committee (Mumbai, India), pp. 1281--1290 (2012)
Show Bibtex | Show Abstract
@inproceedings{wachsmuth:2012, author = {Henning Wachsmuth and Benno Stein}, title = {Optimal Scheduling of Information Extraction Algorithms}, booktitle = {Proceedings of the 24th International Conference on Computational Linguistics: Posters}, year = {2012}, pages = {1281--1290}, address = {Mumbai, India}, publisher = {The COLING 2012 Organizing Committee} }
Most research on run-time efficiency in information extraction is of empirical nature. This paper analyzes the efficiency of information extraction pipelines from a theoretical point of view in order to explain empirical findings. We argue that information extraction can, at its heart, be viewed as a relevance filtering task whose efficiency traces back to the run-times and selectivities of the employed algorithms. To better understand the intricate behavior of information extraction pipelines, we develop a sequence model for scheduling a pipeline's algorithms. In theory, the most efficient schedule corresponds to the Viterbi path through this model and can hence be found by dynamic programming. For real-time applications, it might be too expensive to compute all run-times and selectivities beforehand. However, our model implies the benchmarks of filtering tasks and illustrates that the optimal schedule depends on the distribution of relevant information in the input texts. We give formal and experimental evidence where necessary.
Henning Wachsmuth and Kathrin Bujna: Back to the Roots of Genres: Text Classification by Language Function. In Proceedings of the 5th International Joint Conference on Natural Language Processing. AFNLP (Chiang Mai, Thailand), pp. 632-640 (2011)
Show Bibtex | Show Abstract
@inproceedings{wachsmuth2011b, author = {Henning Wachsmuth and Kathrin Bujna}, title = {Back to the Roots of Genres: Text Classification by Language Function}, booktitle = {Proceedings of the 5th International Joint Conference on Natural Language Processing}, year = {2011}, pages = {632-640}, address = {Chiang Mai, Thailand}, publisher = {AFNLP} }
The term "genre" covers different aspects of both texts and documents, and it has led to many classification schemes. This makes different approaches to genre identification incomparable and the task itself unclear. We introduce the linguistically motivated text classification task language function analysis, LFA, which focuses on one well-defined aspect of genres. The aim of LFA is to determine whether a text is predominantly expressive, appellative, or informative. LFA can be used in search and mining applications to efficiently filter documents of interest. Our approach to LFA relies on fast machine learning classifiers with features from different research areas. We evaluate this approach on a new corpus with 4,806 product texts from two domains. Within one domain, we correctly classify up to 82% of the texts, but differences in feature distribution limit accuracy on out-of-domain data.
Henning Wachsmuth and Benno Stein and Gregor Engels: Constructing Efficient Information Extraction Pipelines. In Proceedings of the 20th ACM Conference on Information and Knowledge Management. ACM (Glasgow, Scotland), pp. 2237-2240 (2011)
Show Bibtex | Show Abstract
@inproceedings{wachsmuth2011a, author = {Henning Wachsmuth and Benno Stein and Gregor Engels}, title = {Constructing Efficient Information Extraction Pipelines}, booktitle = {Proceedings of the 20th ACM Conference on Information and Knowledge Management}, year = {2011}, pages = {2237-2240}, address = {Glasgow, Scotland}, month = {October}, publisher = {ACM} }
Information Extraction (IE) pipelines analyze text through several stages. The pipeline's algorithms determine both its effectiveness and its run-time efficiency. In real-world tasks, however, IE pipelines often fail acceptable run-times because they analyze too much task-irrelevant text. This raises two interesting questions: 1) How much "efficiency potential" depends on the scheduling of a pipeline's algorithms? 2) Is it possible to devise a reliable method to construct efficient IE pipelines? Both questions are addressed in this paper. In particular, we show how to optimize the run-time efficiency of IE pipelines under a given set of algorithms. We evaluate pipelines for three algorithm sets on an industrially relevant task: the extraction of market forecasts from news articles. Using a system-independent measure, we demonstrate that efficiency gains of up to one order of magnitude are possible without compromising a pipeline's original effectiveness.
Henning Wachsmuth and Peter Prettenhofer and Benno Stein: Efficient Statement Identification for Automatic Market Forecasting. In Proceedings of the 23rd International Conference on Computational Linguistics. ACM (Beijing, China), pp. 1128-1136 (2010)
Show Bibtex | Show Abstract
@inproceedings{wachsmuth2010a, author = {Henning Wachsmuth and Peter Prettenhofer and Benno Stein}, title = {Efficient Statement Identification for Automatic Market Forecasting}, booktitle = {Proceedings of the 23rd International Conference on Computational Linguistics}, year = {2010}, pages = {1128-1136}, address = {Beijing, China}, month = {August}, publisher = {ACM} }
Strategic business decision making involves the analysis of market forecasts. Today, the identification and aggregation of relevant market statements is done by human experts, often by analyzing documents from the World Wide Web. We present an efficient information extraction chain to automate this complex natural language processing task and show results for the identification part. Based on time and money extraction, we identify sentences that represent statements on revenue using support vector classification. We provide a corpus with German online news articles, in which more than 2,000 such sentences were annotated by domain experts from the industry. On the test data, our identification algorithm achieves overall precision and recall of 0.86 and 0.87, respectively.
Stephan Arens and Alexander Buss and Helena Deck and Miroslaw Dynia and Matthias Fischer and Holger Hagedorn and Peter Isaak and Jaroslaw Kutylowski and Friedhelm Meyer auf der Heide and Viktor Nesterow and Adrian Ogiermann and Boris Stobbe and Thomas Storm and Henning Wachsmuth: Smart Teams: Simulating Large Robotic Swarms in Vast Environments. In Proceedings of the 4th International Symposium on Autonomous Minirobots for Research and Edutainment. Heinz Nixdorf Institut, University of Paderborn (Buenos Aires, Argentina), pp. 215--222 (2007)
Show Bibtex | Show Abstract
@inproceedings{Wachsmuth2007, author = {Stephan Arens and Alexander Buss and Helena Deck and Miroslaw Dynia and Matthias Fischer and Holger Hagedorn and Peter Isaak and Jaroslaw Kutylowski and Friedhelm Meyer auf der Heide and Viktor Nesterow and Adrian Ogiermann and Boris Stobbe and Thomas Storm and Henning Wachsmuth}, title = {Smart Teams: Simulating Large Robotic Swarms in Vast Environments}, booktitle = {Proceedings of the 4th International Symposium on Autonomous Minirobots for Research and Edutainment}, year = {2007}, pages = {215--222}, address = {Buenos Aires, Argentina}, month = {October}, publisher = {Heinz Nixdorf Institut, University of Paderborn} }
We consider the problem of exploring an unknown environment using a swarm of autonomous robots with collective behavior emerging from their local rules. Each robot has only a very restricted view on the environment which makes cooperation difficult. We introduce a software system which is capable of simulating a large number of such robots (e.g. 1000) on highly complex terrains with millions of obstacles. Its main purpose is to easily integrate and evaluate any kind of algorithm for controlling the robot behavior. The simulation may be observed in real-time via a visualization that displays both the individual and the collective progress of the robots. We present the system design, its main features and underlying concepts.
Dissertationen: Henning Wachsmuth: Pipelines for Ad-hoc Large-scale Text Mining. Type: Phd Thesis (2015)
Show Bibtex | Show Abstract
@phdthesis{wachsmuth2015a, author = {Henning Wachsmuth}, title = {Pipelines for Ad-hoc Large-scale Text Mining}, year = {2015} }
Today's web search and big data analytics applications aim to address information needs~(typically given in the form of search queries) ad-hoc on large numbers of texts. In order to directly return relevant information instead of only returning potentially relevant texts, these applications have begun to employ text mining. The term text mining covers tasks that deal with the inference of structured high-quality information from collections and streams of unstructured input texts. Text mining requires task-specific text analysis processes that may consist of several interdependent steps. These processes are realized with sequences of algorithms from information extraction, text classification, and natural language processing. However, the use of such text analysis pipelines is still restricted to addressing a few predefined information needs. We argue that the reasons behind are three-fold: First, text analysis pipelines are usually made manually in respect of the given information need and input texts, because their design requires expert knowledge about the algorithms to be employed. When information needs have to be addressed that are unknown beforehand, text mining hence cannot be performed ad-hoc. Second, text analysis pipelines tend to be inefficient in terms of run-time, because their execution often includes analyzing texts with computationally expensive algorithms. When information needs have to be addressed ad-hoc, text mining hence cannot be performed in the large. And third, text analysis pipelines tend not to robustly achieve high effectiveness on all texts, because their results are often inferred by algorithms that rely on domain-dependent features of texts. Hence, text mining currently cannot guarantee to infer high-quality information. In this thesis, we contribute to the question of how to address information needs from text mining ad-hoc in an efficient and domain-robust manner. We observe that knowledge about a text analysis process and information obtained within the process help to improve the design, the execution, and the results of the pipeline that realizes the process. To this end, we apply different techniques from classical and statistical artificial intelligence. In particular, we first develop knowledge-based approaches for an ad-hoc pipeline construction and for an optimal execution of a pipeline on its input. Then, we show theoretically and practically how to optimize and adapt the schedule of the algorithms in a pipeline based on information in the analyzed input texts in order to maximize execution efficiency. Finally, we learn patterns in the argumentation structures of texts statistically that remain strongly invariant across domains and that, thereby, allow for more robust analysis results in a restricted set of tasks. We formally analyze all developed approaches and we implement them as open-source software applications. Based on these applications, we evaluate the approaches on established and on newly created collections of texts for scientifically and industrially important text analysis tasks, such as financial event extraction and fine-grained sentiment analysis. Our findings show that text analysis pipelines can be designed automatically, which process only portions of text that are relevant for the information need at hand. Through scheduling, the run-time efficiency of pipelines can be improved by up to more than one order of magnitude while maintaining effectiveness. Moreover, we provide evidence that a pipeline's domain robustness substantially benefits from focusing on argumentation structure in tasks like sentiment analysis. We conclude that our approaches denote essential building blocks of enabling ad-hoc large-scale text mining in web search and big data analytics applications.
Diplomarbeiten: Henning Wachsmuth: Kooperative Bewegungsstrategien für Roboter in unbekannten, merkmalsarmen Umgebungen. Type: Diploma Thesis, diplomathesis (2009)
Show Bibtex | Show Abstract
@mastersthesis{Wachsmuth2009, author = {Henning Wachsmuth}, title = {Kooperative Bewegungsstrategien für Roboter in unbekannten, merkmalsarmen Umgebungen}, year = {2009}, type = {diplomathesis}, month = {Januar} }
Localization, the task of estimating a robot’s position and orientation, is one of the fundamental problems of current research in autonomous mobile robotics. Recently, the use of multiple robots has received increased attention to improve localization accuracy, especially in unknown and featureless environments. This thesis investigates from a theoretical point of view how a team of robots, each equipped with a fixed monocular camera, can move cooperatively in order to reduce positional uncertainty. Vision-based localization is done by using each other as mobile landmarks. Therefore, a new concept for computing the relative distance and angle between robots from single camera images is introduced and formally analyzed. Based on this concept, different principal approaches to cooperative motion strategies are developed for a global coordinate system that allow the given robots to keep track of their locations. These strategies do not require any knowledge about the environment and, thus, even work properly in unknown and unstructured domains.