GRAIL Research Lab
Various Presenters (Allen School)
Research Talk
Thursday, October 26, 2023, 3:30 pm
Abstract
Presenters: Felix Hähnlein; Milin Kodnongbua; Mengyi Shan; Luyang Zhu; Vivek Jayaram; Jiafei Duan
Speaker:Felix Hähnleinn
Title: Debugging CAD programs
Abstract: Computer-Aided Design (CAD) systems are widely used to model human made objects. Compared to other modeling paradigms, one of the key promises of CAD is the ability to easily edit a design by changing only a couple of parameters. However, in practice, editing can lead to unexpected changes and even execution errors which are time consuming to correct.
In this work, we study what kind of errors are commonly encountered by users of modern CAD systems and what strategies they employ to overcome them. Based on our observations, we propose a debugging system, tailored to provide users with information about the underlying program structure of their CAD model.
Bio: Felix Hähnlein did his PhD in Inria, France on sketch-based modeling for design sketches and on non-photorealistic rendering for CAD models. He currently pursues his interest in design sequences as a Postdoc with Gilbert Bernstein and Adriana Schulz.
Speaker: Milin Kodnongbua
Title: Zero-shot CAD Program Re-Parameterization for Interactive Manipulation
Abstract: Parametric CAD models encode entire families of shapes that should, in principle, be easy for designers to explore. However, in practice, parametric CAD models can be difficult to manipulate due to implicit semantic constraints among parameter values. Finding and enforcing these semantic constraints solely from geometry or programmatic shape representations is not possible because these constraints ultimately reflect design intent. They are informed by the designer’s experience and semantics in the real world. To address this challenge, we introduce ReparamCAD, a zero-shot pipeline that leverages pre-trained large language and image model to infer meaningful space of variations for a shape We then re-parameterize a new constrained parametric CAD program that captures these variations, enabling effortless exploration of the design space along meaningful design axes. We evaluated our approach through five examples and a user study. The result showed that the inferred spaces are meaningful and comparable to those defined by experts.
Bio: Milin’s research area is in computational design and fabrication. They focus on optimization of 3D objects and a combination of machine learning and programming languages techniques to infer, understand, and reconstruct 3D objects.
Speaker: Mengyi Shan
Title: Animating Street View
Abstract: We present a system that automatically brings street view imagery to life by populating it with naturally behaving, animated pedestrians and vehicles. Our approach is to remove existing people and vehicles from the input image, insert moving objects with proper scale, angle, motion, and appearance, plan paths and traffic behavior, as well as render the scene with plausible occlusion and shadowing effects. The system achieves these by reconstructing the still image street scene, simulating crowd behavior, and rendering with consistent lighting, visibility, occlusions, and shadows. We demonstrate results on a diverse range of street scenes including regular still images and panoramas.
Bio: Mengyi Shan is a third-year PhD student affiliated with the Graphics and Imaging Lab (GRAIL) and the Reality Lab. She is co-advised by Steve Seitz, Brian Curless and Ira Kemelmacher-Shlizerman. She works on computer vision and computer graphics.
Speaker: Luyang Zhu
Title: TryOnDiffusion: A Tale of Two UNets
Abstract: Given two images depicting a person and a garment worn by another person, our goal is to generate a visualization of how the garment might look on the input person. A key challenge is to synthesize a photorealistic detail-preserving visualization of the garment, while warping the garment to accommodate a significant body pose and shape change across the subjects. Previous methods either focus on garment detail preservation without effective pose and shape variation, or allow try-on with the desired shape and pose but lack garment details. In this paper, we propose a diffusion-based architecture that unifies two UNets (referred to as Parallel-UNet), which allows us to preserve garment details and warp the garment for significant pose and body change in a single network. The key ideas behind Parallel-UNet include: 1) garment is warped implicitly via a cross attention mechanism, 2) garment warp and person blend happen as part of a unified process as opposed to a sequence of two separate tasks. Experimental results indicate that TryOnDiffusion achieves state-of-the-art performance both qualitatively and quantitatively.
Bio: Luyang Zhu is a final year Ph.D. student with GRAIL, co-advised by Prof. Ira Kemelmacher-Shlizerman, Prof. Steven Seitz and Prof. Brian Curless. Luyang Zhu is broadly interested in computer vision and graphics. His current research focuses on human reconstruction and synthesis.
Vivek Jayaram
Title: HRTF Estimation in the Wild
Abstract: Head Related Transfer Functions (HRTFs) play a crucial role in creating immersive spatial audio experiences. However, HRTFs differ significantly from person to person, and traditional methods for estimating personalized HRTFs are expensive, time-consuming, and require specialized equipment. In this paper we present a new method to measure a listener's personalized HRTF using only headphones and sounds in their environment.
Bio: Vivek is a 5th year PhD student in the GRAIL lab. His interests lie in machine learning for music, audio, and speech.
Speaker: Jiafei Duan
Title: Democratizing Robot Learning for All
Abstract: Diligently gathered human demonstrations serve as the unsung heroes empowering the progression of robot learning. Today, demonstrations are collected by training people to use specialized controllers, which (tele-)operate robots to manipulate a small number of objects. By contrast, we introduce AR2-D2: a system for collecting demonstrations which (1) does not require people with specialized training, (2) does not require any real robots during data collection, and therefore, (3) enables manipulation of diverse objects with a real robot. AR2-D2 is a framework in the form of an iOS app that people can use to record a video of themselves manipulating any object while simultaneously capturing essential data modalities for training a real robot. We show that data collected via our system enables the training of behavior cloning agents in manipulating real objects. Our experiments show that training with our AR data is as effective as training with real-world robot demonstrations. Moreover, our user study indicates that users find AR2-D2 intuitive to use and require no training, unlike four other frequently employed methods for collecting robot demonstrations. However, we can use AR2-D2 to democratize robot learning for all.
Bio: Jiafei Duan received his B.Eng (Highest Distinction) from the School of Electrical and Electronics Engineering, Nanyang Technological University of Singapore, under the A*STAR Undergraduate Scholarship. He is currently a second-year PhD student at the Robotics and State Estimation Lab, University of Washington, under the advisory of Professor Dieter Fox and Ranjay Krishna. His research interest lies at the intersection of computer vision and robot learning.