Research Projects

.: 2017 - now: Mariage: Extracting and marrying structured Web data with multiple representations

.: 2012 - 2016: VIDAS - Visualization and Integration of Similarity-based Aggregated Data (CAPES/COFECUB - France)

PhD. Students

.: Roberto P. Velloso
Extracting Records from the Web Using a Signal Processing Approach
Extracting records from web pages enables a number of important applications and has immense value due to the amount and diversity of available information that can be extracted. This problem, although vastly studied, remains open because it is not a trivial one. Due to the scale of data, a feasible approach must be both automatic and efficient (and of course effective). We present here a novel approach, fully automatic and computationally efficient, using signal processing techniques to detect regularities and patterns in the structure of web pages. Our approach segments the web page, detects the data regions within it, identifies the records boundaries and aligns the records. Results show high f-score and linearithmic time complexity behaviour.

.: Richard de Souza
Measuring similarity between heterogeneous research questionnaires
Questionnaires are useful tools for research purposes and are generally used for collecting information about a population of interest, by focusing on different intentions. During the questionnaire project, or for sharing data purpose, it may be useful to check if there is already a questionnaire with the same intention as that being carried out. Well-designed questions can induce respondents to provide better answers. However, examining research questionnaires is not a trivial task since a question can be structured in different ways. In this paper, we propose a similarity measure to match questionnaires that have heterogeneous questions and to provide a ranking method based on variations of a given query. In determining the effectiveness of this approach, we evaluated it through an experimental study, using recall, precision, f-value, MAP and NDGC, and this enabled us to obtain more effective results than other proposals.

.: Rodrigo Gonçalves
A multi-layered expertise representation framework
Computationally representing people expertise allows several tasks such as expert finding, expert profiling and expert matching. There are several related work in the literature on the topic of automated expertise retrieval, i.e., finding and representing expertise information through existing evidences (documents, electronic messages, etc.). None of these work explore one important problem: expertise representation, i.e., how a person’s expertise could be represented in a standard way? In this work we analyzed this problem and propose a novel expertise representation framework to support automated expertise representation and exchange.

Ms. Students

.: Lucas Knochenhauer
WANQA: An Approach to Identify New Unanswered Questions in Communities of Questions and Answers
Big knowledge repositories are on the web and Question and Answer Communities (CQAs) are one of the most collaborative. Daily, their users post a large volume of questions and a great part of them receives no answers, becoming it useless content. Previous works that aim to solve this problem are dependent on the given characteristics of each community. This article proposes an approach based on a classification that results a model able to classify whether a new question is answerable or not. It uses features available in most CQAs. Experiments with data from different CQAs show that the proposal fulfils its goals.

Former Students

My publications are available online at:

Lattes (online CV at CNPq) - that is my always up-to-date list of publications.

Google Scholar


