CVPR (2020)
Imagine a world in which each photo, printed or digitally displayed, hides arbitrary digital data that can be accessed through an internet-connected imaging system. Another way to think about this is physical photographs that have unique QR codes invisibly embedded within them. We present StegaStamp, the first steganographic algorithm to enable robust encoding and decoding of arbitrary hyperlink bitstrings into photos in a manner that approaches perceptual invisibility. StegaStamp comprises a deep neural network that learns an encoding/decoding algorithm robust to image perturbations that approximate the space of distortions resulting from real printing and photography. Our system prototype demonstrates real-time decoding of hyperlinks for photos from in-the-wild video subject to real-world variation in print quality, lighting, shadows, perspective, occlusion and viewing distance.
CVPR (2020)
We present a deep learning solution for estimating the incident illumination at any 3D location within a scene from an input narrow-baseline stereo image pair. Previous approaches for predicting global illumination from images either predict just a single illumination for the entire scene, or separately estimate the illumination at each 3D location without enforcing that the predictions are consistent with the same 3D scene. Instead, we propose a deep learning model that estimates a 3D volumetric RGBA model of a scene, including content outside the observed field of view, and then uses standard volume rendering to estimate the incident illumination at any 3D location within that volume. Our model is trained without any ground truth 3D data and only requires a held-out perspective view near the input stereo pair and a spherical panorama taken within each scene as supervision, as opposed to prior methods for spatially-varying lighting estimation, which require ground truth scene geometry for training. We demonstrate that our method can predict consistent spatially-varying lighting that is convincing enough to plausibly relight and insert highly specular virtual objects into real images.
Vehicles, search and rescue personnel, and endoscopes use flash lights to locate, identify, and view objects in their surroundings. Here we show the first steps of how all these tasks can be done around corners with consumer cameras. We introduce a method that couples traditional geometric understanding and data-driven techniques. To avoid the limitation of large dataset gathering, we train the data-driven models on rendered samples to computationally recover the hidden scene on real data. The method has three independent operating modes: 1) a regression output to localize a hidden object in 2D, 2) an identification output to identify the object type or pose, and 3) a generative network to reconstruct the hidden scene from a new viewpoint.
Nature Photonics Vol. 13 (2018)
Project WebsiteNatureMIT NewsWe demonstrate that the imaging optics of an ultrafast camera (or a depth camera) can be dramatically different from the imaging optics of a conventional photography camera. More specifically, we demonstrate that by folding the optical path in time, one can collapse the conventional photography optics into a compact volume or multiplex various functionalities into a single imaging optics piece without losing spatial or temporal resolution. By using time-folding at different regions of the optical path, we achieve an order of magnitude lens tube compression, ultrafast multi-zoom imaging, and ultrafast multi-spectral imaging. Each demonstration was done with a single image acquisition without moving optical components.
ICCP (2018)
Project WebsiteLocal CopyMIT NewsImaging through fog has important applications in industries such as self-driving cars, augmented driving, airplanes, helicopters, drones and trains. Current solutions are based on radar that suffers from poor resolution (due to the long wavelength), or on time gating that suffers from low signal-to-noise ratio. Here we demonstrate a technique that recovers reflectance and depth of a scene obstructed by dense, dynamic, and heterogeneous fog. For practical use cases in self-driving cars, the imaging system is designed in optical reflection mode with minimal footprint and is based on LIDAR hardware. Specifically, we use a single photon avalanche diode (SPAD) camera that time-tags individual detected photons. A probabilistic computational framework is developed to estimate the fog properties from the measurement itself, and distinguish between background pho- tons reflected from the fog and signal photons reflected from the target.
Optics Express Vol. 25, 17466-17479 (2017)
Project WebsiteLocal CopyOSAA deep learning method for object classification through scattering media. Traditional techniques to see through scattering media rely on a physical model that has to be precisely calibrated. Computationally overcoming the scattering relies heavily on such physical models, and on the calibration accuracy. Thus, such systems are extremely sensitive to an accurate and lengthy calibration process. Our method trains on synthetic data with variations in calibration parameters that allows the network to learn an invariant model to calibration of lab experiments.
IEEE Transactions on Computational Imaging (2017)
Project WebsiteLocal CopyIEEEMIT NewsTraditional cameras require a lens and a mega-pixel sensor to capture images. The lens focuses light from the scene onto the sensor. We demonstrate a new imaging method that is lensless and requires only a single pixel for imaging. Compared to previous single pixel cameras our system allows significantly faster and more efficient acquisition. This is achieved by using ultrafast time-resolved measurement with compressive sensing. The time-resolved sensing adds information to the measurement, thus fewer measurements are needed and the acquisition is faster. Lensless and single pixel imaging computationally resolves major constraints in imaging systems design. Notable applications include imaging in challenging parts of the spectrum (like infrared and THz), and in challenging environments where using a lens is problematic.
Adding an emoji search box to gmail was one of the projects I worked on during my internship at Google.
Drop marbles to solve puzzles in this addictive and challenging puzzle game. Place a variety of colored marbles into funnels so that they arrive in their correct bin. Along the way, watch out for switches, tippers, color mixers and other traps!
Download high quality Spotify songs from their URL. The program reconstructs the PCM frames into a 320 kbps MP3 with metadata. This was not designed to be malicious, simply a personal programing challenge.
SleepTight is designed to improve scoliosis brace usage by increasing patient comfort to encourage consistent use. This is achieved by tightening the brace straps with motors after the patient has fallen asleep. In this project I designed and built the electronics in addition to being the team leader.
Enjoy fast and fun mobile conversation! Cluck gets your friend's attention like a phone call but lets you text instead of talk.
Motor controlled dolly and camera track controlled by an arduino microcontroller. It can be used to create moving timelapse videos.
Create an interactive shared playlist with your friends and watch as songs, movies, and YouTube videos are played perfectly in sync among all devices.