

# **OPTIMIZATION/ACCELERATION OF BLOCK LU DECOMPOSITION ON FPGA.**

## ABSTRACT

decomposition is used in high performance Lu computing and solving a large set of linear equations. This project accelerates the LU-decomposition algorithm using blocking method. Traditional LU-decomposition implementation have an extremely complex architecture and is highly resource intensive. These implementations include computation of large inverse matrices, and computation of large matrix multiplications. Our project suggests an alternate implementation of block LUdecomposition. This implementation obtains the LU decomposition products by decomposing the large nonsingular square matrices by decomposing square shaped-submatrices. The decomposition is obtained step by step in which the intermediate steps include calculation of partial sums. This implementation will in turn result in simpler architecture.

## OBJECTIVE

To implement a simpler and robust architecture of block LU decomposition on FPGA by calculation of partial sums. Below is the block diagram.



#### Fig-1 Block diagram of the architecture.

## R&D SH WCASE 2021 **Technology, Social Impact**

## METHOD

The implementation is inspired by calculation of matrix multiplication via the calculation of inner products. This implementation is extended to calculation of LUdecomposition. Below is the structure of the design.



**Fig-2 Datapath of the implementation** 

Research Center Name: Computer Systems Group.





