2023 - 2025: Sustainable High Performance Computing on AWS

The cloud computing paradigm is defined as a pay-per-use model that allows convenient on-demand access to a configurable group of computing resources in a rapid manner with minimum effort and contact with the provider. In the High Performance Computing (HPC) context, the benefits of using public cloud resources make it an attractive alternative to expensive on-premise HPC clusters. However, the software ecosystem necessary to make possible a sustainable HPC cloud platform is not yet mature. Cost advisors, large contract handlers, DevOps solutions, Application Programming Interfaces (APIs), and HPC-aware resource managers are current software gaps in this regard. In this research project, we propose new solutions to efficiently manage and execute HPC scientific applications and workflows on AWS, with sustainability as a common objective. The proposed open-source tools and optimizations will focus on multiple performance objectives: makespan (how to minimize applications’ execution times), budget (how to choose EC2 standard and spot instances to meet the users’ and applications’ needs with cost savings), energy (how to minimize energy consumption) as well as on fault tolerance (how to provide efficient fault tolerance to HPC applications running on EC2 spot instances).

Funding Amazon Web Services (AWS) and Brazilian National Council for Scientific and Technological Development (CNPq)

2023 - 2028: Céos: Data Intelligence for Society - Assisting Intelligent Decision Making in Complex Domains of the Public Sector

Currently, the vast amount of digital data concerning services and processes involving individuals, businesses, and public institutions presents technological challenges while also representing opportunities to update and enhance the functioning of organizations responsible for monitoring such data. Without appropriate techniques and methodologies for analyzing and extracting knowledge from the large volume of data available in this scenario, multiple opportunities to improve the efficiency and effectiveness of controlling, regulating, and defending institutions, such as the State Public Ministry, are lost. Moreover, these institutions are unable to make intelligent decisions that are crucial for society’s quality of life. An example of a problem that could be mitigated using data intelligence is the misappropriation of public resources. This practice is identified by many renowned financial institutions as one of the main causes for a country’s developmental setbacks. Digital intelligence applied in this context creates opportunities for greater control and transparency. Therefore, new techniques and methodologies are necessary due to the massive quantity and significant heterogeneity of data available for analysis. In practice, it is impossible to effectively analyze such a volume of data manually, which favors undetectable illicit and criminal activities that take advantage of the anonymity provided by this analytical challenge. There are numerous possible research and development avenues regarding data intelligence for society. Through a research partnership between the State Public Ministry of Santa Catarina (MPSC) and the Federal University of Santa Catarina (UFSC), this research project aims to investigate and develop data science techniques and methodologies that can assist in mitigating or solving important specific social problems within the MPSC’s scope. These include detecting irregularities in public purchases and tenders, identifying useful patterns for decision making in the allocation of different types of ICU beds, and automatically detecting information in legal documents to identify inconsistencies and better inform the public about the MPSC’s actions and value delivered to society. We expect to generate new knowledge, techniques, and methodologies involving data intelligence for society, develop highly qualified human resources to work in this field, and produce scientific dissemination materials that clearly demonstrate the project’s results, highlighting the importance of the MPSC and UFSC’s contributions to society.

Funding State Public Ministry of Santa Catarina

2021 - 2024: Solutions to Improve Programmability and Performance of Applications on Lightweight Manycore Processors

The current trend in multicore processors is a continuous growth in the number of processing cores on the chip to meet the computational demands of applications from the most diverse areas of knowledge. ower consumption has become a critical aspect in the development of parallel processors, leading to the emergence of a new class of parallel architectures called lightweight manycores. These processors have hundreds or even thousands of low-frequency processing cores on a single chip, allow the exploitation of task and data parallelism, have restrictive memories distributed on the chip, and make use of Networks-on-Chip (NoCs) to interconnect cores or groups of processing cores. Research involving this class of processors is still at an early stage of maturity and its adoption is hampered by its peculiar architectural characteristics. In general, lightweight manycores introduce important challenges for the development of efficient software that can extract the high performance offered by these processors, both at the basic software level, i.e., at the Operating System (OS), and at the user level. Currently, the vast majority of basic software available for these processors is proprietary and has a low degree of abstraction, which makes it difficult to develop efficient applications. Therefore, this project aims to contribute with solutions that facilitate the adoption and exploitation of lightweight manycore processors in the most diverse application domains. The main objectives of the project are: (1) propose and develop basic open-source software (at OS and user level) that simplify the development of applications for lightweight manycore processors; and (2) propose and develop techniques for exploiting the computing power of lightweight manycore processors.

Funding Brazilian National Council for Scientific and Technological Development (CNPq)


2017 - 2019: Adaptive Global Scheduling for Scientific Applications

Science has advanced in the last decades partially by virtue of numerical simulations done by scientific applications developed in large research centers. Due to large computing power necessities, these scientific applications are developed using parallel programming languages and interfaces in order to benefit from the computing and memory resources available in High Performance Computing (HPC) platforms. Due to the nature of the simulations, application tasks can possess different computational loads, complex communication graphs, or both. These irregular and dynamic behaviors result in load imbalance and communication overhead that affect the applications performance and scalability. In this context, the main objective of this Universal Research Project is to automate the global scheduling algorithm choosing process for scientific applications running in parallel platforms in an adaptive manner. In this way, this project will contribute to an increase in performance in the execution of applications in parallel platforms in an automatic manner for the end users, which leads to faster results, bigger simulations, and the release of human resources that will be able to focus in activities other than handling performance problems related to work distribution.

Funding Brazilian National Council for Scientific and Technological Development (CNPq)

2016 - 2017: EnegySFE: Energy-aware Scheduling and Fault Tolerance Techniques for the Exascale Era

The EnergySFE research project aims at proposing fast and scalable energy-aware scheduling and fault tolerance techniques and algorithms for large-scale highly parallel architectures. The main skills of different international partners will be of great significance to the success of the project: LAPESD and ECL from UFSC (Brazil), CORSE from LIG/CNRS (France), GPPD and LSE from UFRGS (Brazil), and SAPyC from ESPE (Ecuador). The project will be carried out following a methodology that combines theoretical and practical aspects. The techniques and algorithms developed during the project will be applied to real-world scientific applications. The energy and performance improvements obtained from the techniques and algorithms proposed will be evaluated by executing the applications on highly parallel architectures composed of tens or thousands of cores. Overall, the main goals of EnergySFE are the following: (i) Establish a perennial collaboration between UFSC,CNRS,UFRGS and ESPE as well as to promote knowledge transfer between these institutions; (ii) Study the impact of current scheduling and fault tolerance techniques on the performance and energy consumption of scientific applications; (iii) Propose new energy-aware scheduling algorithms adapted to highly parallel architectures; (iv) Propose new energy-aware fault tolerance approaches adapted to highly parallel architectures; (v) Apply the proposed scheduling and fault tolerance approaches to real-world scientific applications and carry out experiments on highly parallel architectures composed of tens or thousands of cores; (vi) Disseminate the results in high quality peer-reviewed international journals and conferences on the HPC domain.

Funding Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Centre National de la Recherche Scientifique (CNRS), Secretaría de Educación Superior, Ciencia, Tecnología e Innovación (SENESCYT)

2014: Exascale Computing: Scheduling and Energy (ExaSE)

The main scientific context of this project is high performance computing on Exascale systems: large-scale machines with billions of processing cores and complex hierarchical structures. This project intends to explore the inherent relationship between scheduling algorithms and techniques and energy constraints on such exascale systems. International cooperation project: UFRGS, PUC Minas and INRIA. Coordinators: Jean-Marc Vincent (INRIA), Nicolas Bruno Maillard (UFRGS) and Henrique Cota de Freitas (PUC Minas).

Funding Fundação de Amparo à Pesquisa do Estado do Rio Grande do Sul (FAPERGS), Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG) and Institut National de Recherche en Informatique et en Automatique (INRIA)

2012: High Performance Computing for Geophysics Applications (HPC-GA)

Simulating large-scale geophysics phenomenon represents, more than ever, a major concern for our society. Recent seismic activity worldwide has shown how crucial it is to enhance our understanding of the impact of earthquakes. Numerical modeling of seismic 3D waves obviously requires highly specific research efforts in geophysics and applied mathematics, leveraging a mix of various schemes such as spectral elements, high-order finite differences or finite elements. But designing and porting geophysics applications on top of nowadays supercomputers also requires a strong expertise in parallel programming and the use of appropriate runtime systems able to efficiently deal with heterogeneous architectures featuring many-core nodes typically equipped with GPU accelerators. The HPC-GA project aims at evaluating the functionalities provided by current runtime systems in order to point out their limitations. It also aims at designing new methods and mechanisms for an efficient scheduling of processes/threads and a clever data distribution on such platforms. The HPC-GA project is unique in gathering an international, pluridisciplinary consortium of leading European and South American researchers featuring complementary expertise to face the challenge of designing high performance geophysics simulations for parallel architectures: UFRGS, INRIA, BCAM and UNAM. Results of this project will be validated using data collected from real sensor networks. Results will be widely disseminated through high-quality publications, workshops and summer-schools.

Funding 7th Framework Programme for Research - International Research Staff Exchange Scheme (IRSES)

2010 - 2011: Skeleton-Enabled Thread Scheduling and Memory Affinity Policies for Transactional Memory Applications on Multi-core NUMA Machines

In this research project, we investigate skeleton-enabled thread scheduling and memory affinity policies for transactional memory applications on multi-core NUMA machines. To accomplish this goal, we combine the OpenSkel system, a transactional skeleton framework developed in the University of Edinburgh, with MAi, an interface that allows us to better place data and threads over the ccNUMA platform by using some memory policies, developed in INRIA - Laboratoire d’Informatique de Grenoble (LIG). We expect that the combination of these tools will improve the performance of transactional skeleton applications on NUMA machines without increasing the parallel programming effort from an application programmer perspective. We will test and evaluate our proposed system for the STAMP Benchmark applications in three different Intel and AMD NUMA machines.

Funding European Network of Excellence on High Performance and Embedded Architecture and Compilation (HiPEAC)

2010 - 2011: Characterization and Evaluation of Parallel Workloads for Many-core Architectures (CEPMany)

The next generation of many-core processors have the capacity of supporting both shared memory and message passing programming models. However, it is important to know the impact of these programming models on chips with high number of cores. The characterization of workloads aims mainly at finding answers to indicate the size of applications, network packets, protocols influence, communication patterns, etc. This proposal of cooperation between PUC Minas-FAPEMIG and LIG-INRIA aims the exchange of experiences and knowledge between both countries.

Funding Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG)

2008 - 2011: Observation and Analysis of Multithreaded Applications on Multi-core Processors (OPM2)

This project concerned the analysis and observation of parallel programming environments (POSIX threads, component-based programming models and Software Transactional Memory) in the context of embedded systems. At the end of this project, three tools have been developed: a generic tool to trace and analyze component-based parallel applications, a generic tool to trace relevant events of Transactonal Memory applications and a tool to debug parallel applications using replay techniques.

Funding ST Microelecronics

2005: Simulation of Electron Dynamics in Field Emission Displays (SDE-FED)

This project aimed at developing a high performance software to simulate the electron dynamics in Field Emission Displays. Considering the high demand for processing units, the parallel approach has been conceived for a cluster platform using the message passing paradigm.

Funding HP Brazil R&D

2004 - 2006: High Performance Technics for the XSL-FO Documents Ripping to VDP (ADR-VDP)

The main objective of this project was directly related to the creation of a robust, portable, scalable tool, with good usability for parallel rendering of VDP documents in industrial printing environments. The biggest challenge of this proposed research project was to conduct the research and refinement of the optimizations to be made in the parallel FOP tool efficiently so that it results in a finished product, and tested with satisfactory performance.

Funding HP Brazil R&D

2004 - 2006: Centro de Pesquisa e Desenvolvimento de Aplicações Paralelas (CAP)

The Research and Development Center of Parallel Applications (CAP - PUCRS/HP Brazil) is dedicated to the study of high performance computing techniques and methodologies applied on the development of solutions for computational intensive applications. The CAP project has two main research lines: (a) support techniques for parallel programs design and (b) development of high performance applications for distributed memory architectures. (a) Support techniques for the design of parallel programs are not directly related to the parallel solution of a given problem, but they offer alternatives to simplify and optimize the process of developing parallel programs. In this scenario, the CAP team is interested in research topics like: (i) Analytical modeling of parallel programs using Stochastic Automata Networks (SAN); (ii) Formal verification of parallel and distributed programs properties using Objects-Based Graph Grammars (GGBO); (iii) Load-balancing algorithms for dynamic irregular applications over distributed memory platforms; (iv) Structural test methodologies for parallel programs. (b) The development of parallel applications for different categories of distributed memory architectures, such as heterogeneous clusters or computational grids, is another major research line of the CAP team. Our focus is the development of new high performance algorithms and/or programs for scientific or industrial problems using the message passing programming paradigm. Recently, the CAP group has been working on parallel solutions for the following applications: (i) Documents rendering tool for high speed printers (FOP - Formatting Objects Processor); (ii) Visualization of medical data for image-based diagnosis; (iii) Simulation of electrons trajectory on Field Emission Displays.

Funding HP Brazil R&D