Approximate multiplication for integer constant division and remainder [Available September 2026]

Integer division and remainder are relatively complicated, expensive operations. Most modern processors have at least one integer multiplier unit that has both low latency (typically between 3 and 5 cycles) and high throughput (typically the unit can complete a multiply operation every cycle). In contrast, most modern processors do not have a fast integer division … Read more

Minimum constant big integer multiplication using fixed-size multiplication units [Available for September 2026]

Almost all modern processors provide special-purpose instructions for integer multiplication. However, some applications require integer values that are larger than fit in a single machine word, such as 256 bit integers. We can multiply these big integer values using a sequence of smaller multiplies. For example, multiplying two 256-bit integer values might be completed with … Read more

Minimum hardware constant multiplication with integer linear programming [Available for September 2026]

In some applications, such as many cryptography algorithms, multiplication by a known integer constant is a common operation. It’s well known that integer multiplication with a constant can be computed with a sequence of shifts and adds. Most modern processors already have fast integer multiplication units, so in software it is usually faster to use … Read more

In-place matrix multiplication for rectangular matrices [Available for September 2026]

Matrix multiplication is one of the commonly-used computations across a wide range of applications. For example, most implementations of neural networks implement very large numbers of matrix multiplications. Almost all processor and GPU manufacturers provide libraries with carefully hand-tuned fast matrix multiplication routines. All the best-known algorithms matrix multiplication are out-of-place; that is, the result … Read more

An extension to OpenMP for Superscalar Processors [Taken]

[Note: this is quite a difficult project. To take the project, you should already have experience with OpenMP, or have an interest in parallel computing, or just be a very good programmer.] OpenMP is a small domain-specific language for describing parallelism in C, C++ and Fortran programs. OpenMP was originally designed to express parallelism using … Read more

In-place matrix multiplication [Taken]

Matrix multiplication is one of the commonly-used computations across a wide range of applications. For example, most implementations of neural networks implement very large numbers of matrix multiplications. Almost all processor and GPU manufacturers provide libraries with carefully hand-tuned fast matrix multiplication routines. All the best-known algorithms matrix multiplication are out-of-place; that is, the result … Read more

A generator for hardware modular reduction units [Taken]

Many cryptographic schemes rely on modular arithmetic. Modular arithmetic is a system of arithmetic operators on integers where the numbers “wrap around” back to zero when they reach a value called the modulus. For example, if our modulus is 17, then (12 + 14) mod 17 = 9. When we implement modular arithmetic, a key … Read more

Efficient in-place sparse matrix transpose [Taken]

A sparse matrix is a two-dimensional array where most of the elements are zero. If a very large sparse matrix is stored in normal array format, a huge amount of memory will be needed to store values that are almost all zero. Instead, large sparse matrices are normally stored in a compacted form, where only … Read more

Automatic tuning of logic synthesis optimization scripts [Taken]

ABC is an open-source logic synthesis and optimization tool developed by Alan Mishchenko at the University of California at Berkeley. ABC specializes in optimizing digital logic circuits at the circuit level, and some commercial hardware synthesis tools incorporate ABC as part of their design flow to improve their circuit optimization. ABC is a large logic … Read more