News

09/01/10

InfexBA in the local press

With the headline "Forecasting the future of markets – s-lab develops automatic market, trend and sentiment analysis for companies", the electronic university press Forschung InSight has published an article on the work of the University of Paderborn on the InfexBA project.

Link to the article: http://www.cs.uni-paderborn.de/insight

08/20/10

Firefox Add-on "RevMarker BA" out now

Since today, the first version of the freely available InfexBA Add-on "RevMarker BA" for Mozilla Firefox can be downloaded at Results.

This Add-on integates the InfexBA market analysis into the widely used open source browser Mozilla Firefox. Any currently opened webpage is analyzed to identify statements on the revenues of companies and markets, either automatically or after pressing a button. The found statements are highlighted directly in the browser.

06/21/10

Music and smartphone corpus available

An annotated text corpus for the development and evaluation of classification techniques concerning genre and sentiment analysis is freely available on the Results page.

Also, the revenue corpus has moved to that page. All further information is given there.

Besides corpora, the Results page contains an overview of the publications related to the project. In the future, it will be extended by download links for freely available software.

06/14/10

Paper at the COLING 2010 in Beijing

In connection to the InfexBA projekt, a scientific paper will be published called Efficient Statement Identification for Automatic Market Forecasting". The paper has been applied to the world-largest computational linguistics conference COLING, which is an biannual meeting that will take place from August 23rd to August 27th 2010 in Beijing.

On the one hand, the publication introduces the overall information extraction process of market analysis and shows results for the identification of statements on revenue. On the other hand, the revenue corpus collected and annotated by Resolto is presented and, thus, made available for the scientific community.

The paper was written by Henning Wachsmuth, Peter Prettenhofer and Benno Stein.

02/22/10

Revenue corpus available

An annotated text corpus for the development and evaluation of Information Extraction techniques for market information is available on this site's homepage.

The corpus consists of 1128 German online news articles taken from 29 popular websites (source URL is given in the document text). Every document comes as an XMI file. In an attached split, 2/3 of the documents represent the training set and each 1/6 refers to the validation set or the test set, respectively.

In each document, every sentence that contains a statement on the revenues of an organization or a market is marked. Additionally, money and time expressions as well as the matter and the author of a statement are annotated and linked to te corresponding sentence annotation. Altogether, 2044 statements on revenue are tagged in this way.

02/09/10

Annotated corpus with opinions on smartphones and music

The choice of documents for the smarphone and music corpus and the annotation of these documents are complete, for now.

The corpus contains 2101 German blog posts from the smartphone domain as well as 3407 professional or personal reviews from the music area. Each text was classified in two categories. First, the genre of the text was annotated, i.e., whether is has commercial, informational or personal background, respectively. Second, the polarity of the opionion on the given topic (positive, negative, or neutral) was tagged if given in the text.

We think that this corpus is a very good starting point to develop algorithms for predicting the sentiment of a text und aggregating such classifications to gain results on the sentiment and buzz of a specified topic.

10/28/09

Poster on InfexBA at the KI2009

The 32nd Annual Conference on Artificial Intelligence, the KI2009, took place in Paderborn from September 15 to September 18, 2009. The s-lab had two introductory posters at the conference, one of them showed the InfexBA project.

With some delay, you can now have a look at this poster here which consists of the main idea, our approaches, the process underlying the technologies and two small examples of information extraction.

09/25/09

Annotated corpus on revenues

Both the choice of documents for the revenue corpus and the annotation of these documents are complete. We're currently checking each document for mistakes in their annotation but this will be finished soon.

The corpus contains 1128 German online news articles from the business domain. A total of 2048 statements on the prospected future revenues (i.e. forecasts) and the past revenues (declaration) of companies, branches or technologies are annotated as such in these articles. Moreover, for every statement its matter and author as well as its corresponding time and money expressions are tagged and linked to the annotation of that statement.

We think that this corpus is an excellent starting point to tackle the problem of automatical market analysis. IE technologies will be used to recognize and understand statements on revenues.

[update 2/9/2010]

06/09/09

Definition of the main functionalities

The main functionalities to be developed in this project have been defined. They all tackle the problem of aggreating and processing information but differ in the kind of information:

a. Declarations: Development of the past revenues of revelant branches.

b. Forecasts: Estimation on the future revenues of relevant branches.

c. Sentiment: Opinions on products, technologies and brands of relevant branches.

d. Buzz: The level of buzz that products, technologies and brand of relevant branches have in social media.

e. Innovations: Amount of patents or patent applications of relevant branches and technologies.

05/05/09

Website online

The InfexBA webpage is online.

Menü