MAIN CONFERENCE
All papers will be presented in the same manner. Each paper will have a five minute pre-recorded video and a PDF of the poster. An asynchronous text chat will be available for each paper. Attendees can view the papers and videos on demand at any time. Authors will also have individual Q&A sessions at the posted times below.
All posted times are EDT but the chart linked below has all time zones’ conversions. When the virtual site is up, you will be able to select which sessions you are interested in and it will populate your own schedule.
Presentation Schedule
-
All times are Eastern Daylight Time
Date: Thursday, June 24, 2021 22:00 – 24:30
Paper Session Eleven:
Paper ID | Paper Title | Authors |
3090 | POSEFusion: Pose-Guided Selective Fusion for Single-View Human Volumetric Capture | Zhe Li, Tao Yu, Zerong Zheng, Kaiwen Guo, Yebin Liu |
4439 | FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds | Haiyan Wang, Jiahao Pang, Muhammad A. Lodhi, Yingli Tian, Dong Tian |
4757 | Isometric Multi-Shape Matching | Maolin Gao, Zorah Lähner, Johan Thunberg, Daniel Cremers, Florian Bernard |
2475 | PatchmatchNet: Learned Multi-View Patchmatch Stereo | Fangjinhua Wang, Silvano Galliani, Christoph Vogel, Pablo Speciale, Marc Pollefeys |
840 | Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos | Hehe Fan, Yi Yang, Mohan Kankanhalli |
4104 | Learning To Aggregate and Personalize 3D Face From In-the-Wild Photo Collection | Zhenyu Zhang, Yanhao Ge, Renwang Chen, Ying Tai, Yan Yan, Jian Yang, Chengjie Wang, Jilin Li, Feiyue Huang |
250 | MagFace: A Universal Representation for Face Recognition and Quality Assessment | Qiang Meng, Shichao Zhao, Zhida Huang, Feng Zhou |
5291 | Event-Based Synthetic Aperture Imaging With a Hybrid Network | Xiang Zhang, Wei Liao, Lei Yu, Wen Yang, Gui-Song Xia |
300 | GLEAN: Generative Latent Bank for Large-Factor Image Super-Resolution | Kelvin C.K. Chan, Xintao Wang, Xiangyu Xu, Jinwei Gu, Chen Change Loy |
10597 | NPAS: A Compiler-Aware Framework of Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration | Zhengang Li, Geng Yuan, Wei Niu, Pu Zhao, Yanyu Li, Yuxuan Cai, Xuan Shen, Zheng Zhan, Zhenglun Kong, Qing Jin, Zhiyu Chen, Sijia Liu, Kaiyuan Yang, Bin Ren, Yanzhi Wang, Xue Lin |
283 | Privacy-Preserving Image Features via Adversarial Affine Subspace Embeddings | Mihai Dusmanu, Johannes L. Schönberger, Sudipta N. Sinha, Marc Pollefeys |
8523 | Image Generators With Conditionally-Independent Pixel Synthesis | Ivan Anokhin, Kirill Demochkin, Taras Khakhulin, Gleb Sterkin, Victor Lempitsky, Denis Korzhenkov |
7086 | CoMoGAN: Continuous Model-Guided Image-to-Image Translation | Fabio Pizzati, Pietro Cerri, Raoul de Charette |
3638 | Positive-Congruent Training: Towards Regression-Free Model Updates | Sijie Yan, Yuanjun Xiong, Kaustav Kundu, Shuo Yang, Siqi Deng, Meng Wang, Wei Xia, Stefano Soatto |
5052 | Capsule Network Is Not More Robust Than Convolutional Network | Jindong Gu, Volker Tresp, Han Hu |
1228 | Dual-Stream Multiple Instance Learning Network for Whole Slide Image Classification With Self-Supervised Contrastive Learning | Bin Li, Yin Li, Kevin W. Eliceiri |
919 | Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking | Fatemeh Saleh, Sadegh Aliakbarian, Hamid Rezatofighi, Mathieu Salzmann, Stephen Gould |
6367 | Adaptive Methods for Real-World Domain Generalization | Abhimanyu Dubey, Vignesh Ramanathan, Alex Pentland, Dhruv Mahajan |
4382 | Self-Supervised Geometric Perception | Heng Yang, Wei Dong, Luca Carlone, Vladlen Koltun |
1702 | HITNet: Hierarchical Iterative Tile Refinement Network for Real-time Stereo Matching | Vladimir Tankovich, Christian Häne, Yinda Zhang, Adarsh Kowdle, Sean Fanello, Sofien Bouaziz |
1348 | Bidirectional Projection Network for Cross Dimension Scene Understanding | Wenbo Hu, Hengshuang Zhao, Li Jiang, Jiaya Jia, Tien-Tsin Wong |
3364 | A Fourier-Based Framework for Domain Generalization | Qinwei Xu, Ruipeng Zhang, Ya Zhang, Yanfeng Wang, Qi Tian |
5421 | Open-Vocabulary Object Detection Using Captions | Alireza Zareian, Kevin Dela Rosa, Derek Hao Hu, Shih-Fu Chang |
7124 | MP3: A Unified Model To Map, Perceive, Predict and Plan | Sergio Casas, Abbas Sadat, Raquel Urtasun |
6158 | Hierarchical Lovász Embeddings for Proposal-Free Panoptic Segmentation | Tommi Kerola, Jie Li, Atsushi Kanehira, Yasunori Kudo, Alexis Vallet, Adrien Gaidon |
10741 | Accurate Few-Shot Object Detection With Support-Query Mutual Guidance and Hybrid Loss | Lu Zhang, Shuigeng Zhou, Jihong Guan, Ji Zhang |
7790 | GLAVNet: Global-Local Audio-Visual Cues for Fine-Grained Material Recognition | Fengmin Shi, Jie Guo, Haonan Zhang, Shan Yang, Xiying Wang, Yanwen Guo |
4506 | Multi-Scale Aligned Distillation for Low-Resolution Detection | Lu Qi, Jason Kuen, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya Jia |
2829 | Sparse R-CNN: End-to-End Object Detection With Learnable Proposals | Peize Sun, Rufeng Zhang, Yi Jiang, Tao Kong, Chenfeng Xu, Wei Zhan, Masayoshi Tomizuka, Lei Li, Zehuan Yuan, Changhu Wang, Ping Luo |
5547 | Learning View Selection for 3D Scenes | Yifan Sun, Qixing Huang, Dun-Yu Hsiao, Li Guan, Gang Hua |
3305 | Multi-Person Implicit Reconstruction From a Single Image | Armin Mustafa, Akin Caliskan, Lourdes Agapito, Adrian Hilton |
8601 | Neural Descent for Visual 3D Human Pose and Shape | Andrei Zanfir, Eduard Gabriel Bazavan, Mihai Zanfir, William T. Freeman, Rahul Sukthankar, Cristian Sminchisescu |
801 | SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud | Wu Zheng, Weiliang Tang, Li Jiang, Chi-Wing Fu |
3198 | SCF-Net: Learning Spatial Contextual Features for Large-Scale Point Cloud Segmentation | Siqi Fan, Qiulei Dong, Fenghua Zhu, Yisheng Lv, Peijun Ye, Fei-Yue Wang |
6371 | Equivariant Point Network for 3D Point Cloud Analysis | Haiwei Chen, Shichen Liu, Weikai Chen, Hao Li, Randall Hill |
4353 | DeepSurfels: Learning Online Appearance Fusion | Marko Mihajlovic, Silvan Weder, Marc Pollefeys, Martin R. Oswald |
4550 | Efficient Deformable Shape Correspondence via Multiscale Spectral Manifold Wavelets Preservation | Ling Hu, Qinsong Li, Shengjun Liu, Xinru Liu |
2531 | Efficient Initial Pose-Graph Generation for Global SfM | Daniel Barath, Dmytro Mishkin, Iván Eichhardt, Ilia Shipachev, Jiří Matas |
10099 | AutoInt: Automatic Integration for Fast Neural Volume Rendering | David B. Lindell, Julien N. P. Martel, Gordon Wetzstein |
1287 | Extreme Rotation Estimation Using Dense Correlation Volumes | Ruojin Cai, Bharath Hariharan, Noah Snavely, Hadar Averbuch-Elor |
468 | A Quasiconvex Formulation for Radial Cameras | Carl Olsson, Viktor Larsson, Fredrik Kahl |
10278 | ReAgent: Point Cloud Registration Using Imitation and Reinforcement Learning | Dominik Bauer, Timothy Patten, Markus Vincze |
10091 | Monocular Depth Estimation via Listwise Ranking Using the Plackett-Luce Model | Julian Lienen, Eyke Hüllermeier, Ralph Ewerth, Nils Nommensen |
7531 | HVPR: Hybrid Voxel-Point Representation for Single-Stage 3D Object Detection | Jongyoun Noh, Sanghoon Lee, Bumsub Ham |
2503 | 3DIoUMatch: Leveraging IoU Prediction for Semi-Supervised 3D Object Detection | He Wang, Yezhen Cong, Or Litany, Yue Gao, Leonidas J. Guibas |
5573 | Multi-Label Activity Recognition Using Activity-Specific Features and Activity Correlations | Yanyi Zhang, Xinyu Li, Ivan Marsic |
8473 | LaPred: Lane-Aware Prediction of Multi-Modal Future Trajectories of Dynamic Agents | ByeoungDo Kim, Seong Hyeon Park, Seokhwan Lee, Elbek Khoshimjonov, Dongsuk Kum, Junsoo Kim, Jeong Soo Kim, Jun Won Choi |
6457 | Detecting Human-Object Interaction via Fabricated Compositional Learning | Zhi Hou, Baosheng Yu, Yu Qiao, Xiaojiang Peng, Dacheng Tao |
6161 | Understanding the Robustness of Skeleton-Based Action Recognition Under Adversarial Attack | He Wang, Feixiang He, Zhexi Peng, Tianjia Shao, Yong-Liang Yang, Kun Zhou, David Hogg |
6375 | Invisible Perturbations: Physical Adversarial Examples Exploiting the Rolling Shutter Effect | Athena Sayles, Ashish Hooda, Mohit Gupta, Rahul Chatterjee, Earlence Fernandes |
5160 | Bottom-Up Human Pose Estimation via Disentangled Keypoint Regression | Zigang Geng, Ke Sun, Bin Xiao, Zhaoxiang Zhang, Jingdong Wang |
6336 | Semi-Supervised 3D Hand-Object Poses Estimation With Interactions in Time | Shaowei Liu, Hanwen Jiang, Jiarui Xu, Sifei Liu, Xiaolong Wang |
5809 | Inverse Simulation: Reconstructing Dynamic Geometry of Clothed Humans via Optimal Control | Jingfan Guo, Jie Li, Rahul Narain, Hyun Soo Park |
5950 | Populating 3D Scenes by Learning Human-Scene Interaction | Mohamed Hassan, Partha Ghosh, Joachim Tesch, Dimitrios Tzionas, Michael J. Black |
6522 | Towards High Fidelity Face Relighting With Realistic Shadows | Andrew Hou, Ze Zhang, Michel Sarkis, Ning Bi, Yiying Tong, Xiaoming Liu |
2484 | VirFace: Enhancing Face Recognition via Unlabeled Shallow Data | Wenyu Li, Tianchu Guo, Pengyu Li, Binghui Chen, Biao Wang, Wangmeng Zuo, Lei Zhang |
4317 | Birds of a Feather: Capturing Avian Shape Models From Images | Yufu Wang, Nikos Kolotouros, Kostas Daniilidis, Marc Badger |
5010 | Unsupervised Pre-Training for Person Re-Identification | Dengpan Fu, Dongdong Chen, Jianmin Bao, Hao Yang, Lu Yuan, Lei Zhang, Houqiang Li, Dong Chen |
2450 | Indoor Lighting Estimation Using an Event Camera | Zehao Chen, Qian Zheng, Peisong Niu, Huajin Tang, Gang Pan |
6538 | Checkerboard Context Model for Efficient Learned Image Compression | Dailan He, Yaoyan Zheng, Baocheng Sun, Yan Wang, Hongwei Qin |
2144 | Neighbor2Neighbor: Self-Supervised Denoising From Single Noisy Images | Tao Huang, Songjiang Li, Xu Jia, Huchuan Lu, Jianzhuang Liu |
1798 | From Rain Generation to Rain Removal | Hong Wang, Zongsheng Yue, Qi Xie, Qian Zhao, Yefeng Zheng, Deyu Meng |
4459 | Rank-One Prior: Toward Real-Time Scene Recovery | Jun Liu, Wen Liu, Jianing Sun, Tieyong Zeng |
2469 | Robust Reflection Removal With Reflection-Free Flash-Only Cues | Chenyang Lei, Qifeng Chen |
6551 | Multi-Stage Progressive Image Restoration | Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao |
6186 | Shape From Sky: Polarimetric Normal Recovery Under the Sky | Tomoki Ichikawa, Matthew Purri, Ryo Kawahara, Shohei Nobuhara, Kristin Dana, Ko Nishino |
2168 | Cross-MPI: Cross-Scale Stereo for Image Super-Resolution Using Multiplane Images | Yuemei Zhou, Gaochang Wu, Ying Fu, Kun Li, Yebin Liu |
8247 | Deep Perceptual Preprocessing for Video Coding | Aaron Chadha, Yiannis Andreopoulos |
10973 | StyleMix: Separating Content and Style for Enhanced Data Augmentation | Minui Hong, Jinwoo Choi, Gunhee Kim |
5521 | Spoken Moments: Learning Joint Audio-Visual Representations From Video Descriptions | Mathew Monfort, SouYoung Jin, Alexander Liu, David Harwath, Rogerio Feris, James Glass, Aude Oliva |
1509 | Spatially-Adaptive Pixelwise Networks for Fast Image Translation | Tamar Rott Shaham, Michaël Gharbi, Richard Zhang, Eli Shechtman, Tomer Michaeli |
10136 | No Frame Left Behind: Full Video Action Recognition | Xin Liu, Silvia L. Pintea, Fatemeh Karimi Nejadasl, Olaf Booij, Jan C. van Gemert |
10109 | Multiresolution Knowledge Distillation for Anomaly Detection | Mohammadreza Salehi, Niousha Sadjadi, Soroosh Baselizadeh, Mohammad H. Rohban, Hamid R. Rabiee |
4970 | Convolutional Neural Network Pruning With Structural Redundancy Reduction | Zi Wang, Chengcheng Li, Xiangyang Wang |
409 | Representative Forgery Mining for Fake Face Detection | Chengrui Wang, Weihong Deng |
5419 | Neural Prototype Trees for Interpretable Fine-Grained Image Recognition | Meike Nauta, Ron van Bree, Christin Seifert |
7931 | Relevance-CAM: Your Model Already Knows Where To Look | Jeong Ryong Lee, Sewon Kim, Inyong Park, Taejoon Eo, Dosik Hwang |
329 | Adaptive Cross-Modal Prototypes for Cross-Domain Visual-Language Retrieval | Yang Liu, Qingchao Chen, Samuel Albanie |
4936 | Efficient Object Embedding for Spliced Image Retrieval | Bor-Chun Chen, Zuxuan Wu, Larry S. Davis, Ser-Nam Lim |
10581 | Generative PointNet: Deep Energy-Based Learning on Unordered Point Sets for 3D Generation, Reconstruction and Classification | Jianwen Xie, Yifei Xu, Zilong Zheng, Song-Chun Zhu, Ying Nian Wu |
2535 | Anycost GANs for Interactive Image Synthesis and Editing | Ji Lin, Richard Zhang, Frieder Ganz, Song Han, Jun-Yan Zhu |
10382 | Ensembling With Deep Generative Views | Lucy Chai, Jun-Yan Zhu, Eli Shechtman, Phillip Isola, Richard Zhang |
5457 | Continuous Face Aging via Self-Estimated Residual Age Embedding | Zeqi Li, Ruowei Jiang, Parham Aarabi |
633 | ReMix: Towards Image-to-Image Translation With Limited Data | Jie Cao, Luanxuan Hou, Ming-Hsuan Yang, Ran He, Zhenan Sun |
5888 | Unbalanced Feature Transport for Exemplar-Based Image Translation | Fangneng Zhan, Yingchen Yu, Kaiwen Cui, Gongjie Zhang, Shijian Lu, Jianxiong Pan, Changgong Zhang, Feiying Ma, Xuansong Xie, Chunyan Miao |
2759 | Pose-Guided Human Animation From a Single Image in the Wild | Jae Shin Yoon, Lingjie Liu, Vladislav Golyanik, Kripasindhu Sarkar, Hyun Soo Park, Christian Theobalt |
2063 | Context-Aware Layout to Image Generation With Enhanced Object Appearance | Sen He, Wentong Liao, Michael Ying Yang, Yongxin Yang, Yi-Zhe Song, Bodo Rosenhahn, Tao Xiang |
3670 | SetVAE: Learning Hierarchical Composition for Generative Modeling of Set-Structured Data | Jinwoo Kim, Jaehoon Yoo, Juho Lee, Seunghoon Hong |
5460 | Are Labels Always Necessary for Classifier Accuracy Evaluation? | Weijian Deng, Liang Zheng |
2103 | Graph-Based High-Order Relation Discovery for Fine-Grained Recognition | Yifan Zhao, Ke Yan, Feiyue Huang, Jia Li |
384 | Long-Tailed Multi-Label Visual Recognition by Collaborative Training on Uniform and Re-Balanced Samplings | Hao Guo, Song Wang |
10610 | SimPLE: Similar Pseudo Label Exploitation for Semi-Supervised Classification | Zijian Hu, Zhengyu Yang, Xuefeng Hu, Ram Nevatia |
6149 | Cluster-Wise Hierarchical Generative Model for Deep Amortized Clustering | Huafeng Liu, Jiaqi Wang, Liping Jing |
2852 | From Semantic Categories to Fixations: A Novel Weakly-Supervised Visual-Auditory Saliency Detection Approach | Guotao Wang, Chenglizhao Chen, Deng-Ping Fan, Aimin Hao, Hong Qin |
7968 | Gaussian Context Transformer | Dongsheng Ruan, Daiyin Wang, Yuan Zheng, Nenggan Zheng, Min Zheng |
2777 | FP-NAS: Fast Probabilistic Neural Architecture Search | Zhicheng Yan, Xiaoliang Dai, Peizhao Zhang, Yuandong Tian, Bichen Wu, Matt Feiszli |
7620 | Time Adaptive Recurrent Neural Network | Anil Kag, Venkatesh Saligrama |
3925 | Deep Lesion Tracker: Monitoring Lesions in 4D Longitudinal Imaging Studies | Jinzheng Cai, Youbao Tang, Ke Yan, Adam P. Harrison, Jing Xiao, Gigin Lin, Le Lu |
1251 | Reciprocal Landmark Detection and Tracking With Extremely Few Annotations | Jianzhe Lin, Ghazal Sahebzamani, Christina Luong, Fatemeh Taheri Dezaki, Mohammad Jafari, Purang Abolmaesumi, Teresa Tsang |
1629 | LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search | Bin Yan, Houwen Peng, Kan Wu, Dong Wang, Jianlong Fu, Huchuan Lu |
922 | TesseTrack: End-to-End Learnable Multi-Person Articulated 3D Pose Tracking | N Dinesh Reddy, Laurent Guigues, Leonid Pishchulin, Jayan Eledath, Srinivasa G. Narasimhan |
980 | Learning Optical Flow From Still Images | Filippo Aleotti, Matteo Poggi, Stefano Mattoccia |
3576 | Towards Robust Classification Model by Counterfactual and Invariant Data Generation | Chun-Hao Chang, George Alexandru Adam, Anna Goldenberg |
1205 | StablePose: Learning 6D Object Poses From Geometrically Stable Patches | Yifei Shi, Junwen Huang, Xin Xu, Yifan Zhang, Kai Xu |
6959 | The Translucent Patch: A Physical and Universal Attack on Object Detectors | Alon Zolfi, Moshe Kravchik, Yuval Elovici, Asaf Shabtai |
1908 | Dynamic Weighted Learning for Unsupervised Domain Adaptation | Ni Xiao, Lei Zhang |
3461 | DRANet: Disentangling Representation and Adaptation Networks for Unsupervised Cross-Domain Adaptation | Seunghun Lee, Sunghyun Cho, Sunghoon Im |
10659 | Natural Adversarial Examples | Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, Dawn Song |
10237 | Fast End-to-End Learning on Protein Surfaces | Freyr Sverrisson, Jean Feydy, Bruno E. Correia, Michael M. Bronstein |
7358 | Rectification-Based Knowledge Retention for Continual Learning | Pravendra Singh, Pratik Mazumder, Piyush Rai, Vinay P. Namboodiri |
10227 | Cross-Domain Similarity Learning for Face Recognition in Unseen Domains | Masoud Faraki, Xiang Yu, Yi-Hsuan Tsai, Yumin Suh, Manmohan Chandraker |
2866 | Sequence-to-Sequence Contrastive Learning for Text Recognition | Aviad Aberdam, Ron Litman, Shahar Tsiper, Oron Anschel, Ron Slossberg, Shai Mazor, R. Manmatha, Pietro Perona |
6254 | MOOD: Multi-Level Out-of-Distribution Detection | Ziqian Lin, Sreya Dutta Roy, Yixuan Li |
4245 | DeepVideoMVS: Multi-View Stereo on Video With Recurrent Spatio-Temporal Fusion | Arda Düzçeker, Silvano Galliani, Christoph Vogel, Pablo Speciale, Mihai Dusmanu, Marc Pollefeys |
607 | Boundary IoU: Improving Object-Centric Image Segmentation Evaluation | Bowen Cheng, Ross Girshick, Piotr Dollár, Alexander C. Berg, Alexander Kirillov |
6876 | A2-FPN: Attention Aggregation Based Feature Pyramid Network for Instance Segmentation | Miao Hu, Yali Li, Lu Fang, Shengjin Wang |
769 | SSLayout360: Semi-Supervised Indoor Layout Estimation From 360° Panorama | Phi Vu Tran |
2326 | Complete & Label: A Domain Adaptation Approach to Semantic Segmentation of LiDAR Point Clouds | Li Yi, Boqing Gong, Thomas Funkhouser |
4564 | Improved Image Matting via Real-Time User Clicks and Uncertainty Estimation | Tianyi Wei, Dongdong Chen, Wenbo Zhou, Jing Liao, Hanqing Zhao, Weiming Zhang, Nenghai Yu |
2894 | Self-Supervised Augmentation Consistency for Adapting Semantic Segmentation | Nikita Araslanov, Stefan Roth |
11486 | Few-Shot Object Detection via Classification Refinement and Distractor Retreatment | Yiting Li, Haiyue Zhu, Yu Cheng, Wenxin Wang, Chek Sing Teo, Cheng Xiang, Prahlad Vadakkepat, Tong Heng Lee |
4951 | Counterfactual Zero-Shot and Open-Set Visual Recognition | Zhongqi Yue, Tan Wang, Qianru Sun, Xian-Sheng Hua, Hanwang Zhang |
7328 | Learning Deep Latent Variable Models by Short-Run MCMC Inference With Optimal Transport Correction | Dongsheng An, Jianwen Xie, Ping Li |
5886 | Learning Normal Dynamics in Videos With Meta Prototype Network | Hui Lv, Chen Chen, Zhen Cui, Chunyan Xu, Yong Li, Jian Yang |
5092 | MotionRNN: A Flexible Model for Video Prediction With Spacetime-Varying Motions | Haixu Wu, Zhiyu Yao, Jianmin Wang, Mingsheng Long |
364 | Learning To Recommend Frame for Interactive Video Object Segmentation in the Wild | Zhaoyuan Yin, Jia Zheng, Weixin Luo, Shenhan Qian, Hanling Zhang, Shenghua Gao |
3672 | Reciprocal Transformations for Unsupervised Video Object Segmentation | Sucheng Ren, Wenxi Liu, Yongtuo Liu, Haoxin Chen, Guoqiang Han, Shengfeng He |
4241 | RSTNet: Captioning With Adaptive Attention on Visual and Non-Visual Words | Xuying Zhang, Xiaoshuai Sun, Yunpeng Luo, Jiayi Ji, Yiyi Zhou, Yongjian Wu, Feiyue Huang, Rongrong Ji |
9903 | Revamping Cross-Modal Recipe Retrieval With Hierarchical Transformers and Self-Supervised Learning | Amaia Salvador, Erhan Gundogdu, Loris Bazzani, Michael Donoser |
845 | Visually Informed Binaural Audio Generation without Binaural Audios | Xudong Xu, Hang Zhou, Ziwei Liu, Bo Dai, Xiaogang Wang, Dahua Lin |
1298 | VisualVoice: Audio-Visual Speech Separation With Cross-Modal Consistency | Ruohan Gao, Kristen Grauman |
5128 | Encoder Fusion Network With Co-Attention Embedding for Referring Image Segmentation | Guang Feng, Zhiwei Hu, Lihe Zhang, Huchuan Lu |
9940 | Semantic Audio-Visual Navigation | Changan Chen, Ziad Al-Halah, Kristen Grauman |
8058 | Bridge To Answer: Structure-Aware Graph Interaction Network for Video Question Answering | Jungin Park, Jiyoung Lee, Kwanghoon Sohn |
4588 | Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-View Transformation | Weixiang Yang, Qi Li, Wenxi Liu, Yuanlong Yu, Yuexin Ma, Shengfeng He, Jia Pan |
6278 | Toward Accurate and Realistic Outfits Visualization With Attention to Details | Kedan Li, Min Jin Chong, Jeffrey Zhang, Jingen Liu |
7055 | Interpretable Social Anchors for Human Trajectory Forecasting in Crowds | Parth Kothari, Brian Sifringer, Alexandre Alahi |