Klint Qinami | Software Engineer

About

I’m a software engineer interested in performance optimization for large, real-world systems. I’m currently a member of technical staff at Anthropic.

Before Anthropic, I was a machine learning systems engineer at Meta, building compilers, frameworks, and kernels for MTIA training/inference within the PyTorch organization. Before that I built compiler toolchains at the startup Reservoir Labs and continued that work at Qualcomm following the acquisition, focusing on machine-learning compilers for wide-vector VLIW DSP accelerators. Earlier, I was a Ph.D. student at Princeton studying bias mitigation in machine learning, and as an undergraduate at Columbia I worked on computer graphics, physics-based simulation, and geometry processing.

Projects

SESE Regions. Python implementation of the Johnson-Pearson-Pingali algorithm for canonical single-entry/single-exit regions and program structure trees, with Graphviz exporters for CFG and region visualization.
Offline PlantID. SwiftUI iOS app for offline plant identification using a TensorFlow Lite model trained on iNaturalist data, with on-device inference.
IMDb Movie Toolkit. CLI tool that aggregates IMDb titles by year with filters for votes, ratings, genres, title type, runtime, and output formats. HTML sample

Publications

Towards Fairness in Visual Recognition: Effective Strategies for Bias Mitigation. Zeyu Wang, Klint Qinami, Ioannis Karakozis, Kyle Genova, Prem Nair, Kenji Hata, and Olga Russakovsky. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
Towards Fairer Datasets: Filtering and Balancing the Distribution of the People Subtree in the ImageNet Hierarchy. Kaiyu Yang, Klint Qinami, Li Fei-Fei, Jia Deng, and Olga Russakovsky. ACM Conference on Fairness, Accountability and Transparency (FAccT), 2020.

Talks

Compiler-Driven Performance Optimization for Neural Networks. Klint Qinami. CDP Workshop 2025. Compiler optimization techniques developed for MTIA's next-generation architecture.

Abstract

We present compiler optimization techniques developed for MTIA's next-generation architecture, which delivers 3x performance improvement over the previous generation. Performance evaluation on production ranking and recommendation models demonstrates significant improvements in memory utilization and overall system efficiency. The techniques contribute to MTIA's 6x model serving throughput improvement and 1.5x performance-per-watt gains over the previous generation, enabling Meta to efficiently serve models ranging from low-complexity to high-complexity recommendation workloads with 10x-100x differences in model size. We describe a multi-stage compilation pipeline that leverages PyTorch's Inductor backend while introducing novel graph-level optimizations tailored for AI accelerators. Our approach addresses several key challenges: (1) tensor view elimination that converts explicit layout transformations into implicit tensor view manipulations, (2) memory-aware operator fusion strategies that consider both computational efficiency and memory hierarchy constraints, and (3) dynamic shape handling that maintains performance optimization paths despite runtime variability.

The compiler uses memory placement strategies that automatically partition tensors between fast on-chip SRAM and external DRAM based on access patterns, lifetime analysis, and fallback strategies. When SRAM capacity is exceeded, our spilling mechanisms intelligently migrate data while minimizing performance impact. We also employ scheduling and tiling optimizations that decompose large tensor operations into smaller blocks that fit within memory constraints while maximizing data reuse. Additionally, graph-level transformations simplify and canonicalize graphs, eliminate redundant operations, and support both vertical and horizontal fusions to improve compute density.

Undergraduate Work

Computer Science

QClang: High-Level Quantum Computing Language. Klint Qinami, Connor Abbott (2018). A research prototype language and compiler for expressing quantum programs at a high level. GitHub
Quantifying Geometric Entanglement: the Linking Number of Two Open Curves. Klint Qinami, Eitan Grinspun (2017). A computational geometry study connecting curve topology with linking number estimation.
Blis: Better Language for Image Stuff. Connor Abbott, Wendy Pan, Klint Qinami, Jason Vaccario (2017). A small language and toolchain for concise image-processing programs. GitHub

Math

Algebraic Topology. A set of notes on homology, cohomology, and fundamental groups.
Point-Set Topology. Notes covering metric spaces, continuity, and compactness.
Modern Algebra. Notes on groups, rings, and fields with worked examples.

Physics

Superconductivity Lab Software. Control and analysis tools for a superconductivity experiment lab setup. GitHub
Quantum Hall Effect Software. Data acquisition and visualization software for quantum Hall measurements. GitHub

Elsewhere

GitHub - Repos, tools, and source code.
Stack Exchange - Math answers and discussions.