MAIN CONFERENCE
All papers will be presented in the same manner. Each paper will have a five minute pre-recorded video and a PDF of the poster. An asynchronous text chat will be available for each paper. Attendees can view the papers and videos on demand at any time. Authors will also have individual Q&A sessions at the posted times below.
All posted times are EDT but the chart linked below has all time zones’ conversions. When the virtual site is up, you will be able to select which sessions you are interested in and it will populate your own schedule.
Presentation Schedule
-
All times are Eastern Daylight Time
Date: Tuesday, June 22, 2021 22:00 – 24:30
Paper Session Five:
Paper ID | Paper Title | Authors |
2677 | Layer-Wise Searching for 1-Bit Detectors | Sheng Xu, Junhe Zhao, Jinhu Lü, Baochang Zhang, Shumin Han, David Doermann |
1285 | Weakly Supervised Learning of Rigid 3D Scene Flow | Zan Gojcic, Or Litany, Andreas Wieser, Leonidas J. Guibas, Tolga Birdal |
3494 | Learning Compositional Radiance Fields of Dynamic Human Heads | Ziyan Wang, Timur Bagautdinov, Stephen Lombardi, Tomas Simon, Jason Saragih, Jessica Hodgins, Michael Zollhöfer |
2703 | Learning Accurate Dense Correspondences and When To Trust Them | Prune Truong, Martin Danelljan, Luc Van Gool, Radu Timofte |
5094 | RSN: Range Sparse Net for Efficient, Accurate LiDAR 3D Object Detection | Pei Sun, Weiyue Wang, Yuning Chai, Gamaleldin Elsayed, Alex Bewley, Xiao Zhang, Cristian Sminchisescu, Dragomir Anguelov |
8619 | LAFEAT: Piercing Through Adversarial Defenses With Latent Features | Yunrui Yu, Xitong Gao, Cheng-Zhong Xu |
2715 | Function4D: Real-Time Human Volumetric Capture From Very Sparse Consumer RGBD Sensors | Tao Yu, Zerong Zheng, Kaiwen Guo, Pengpeng Liu, Qionghai Dai, Yebin Liu |
5093 | Polka Lines: Learning Structured Illumination and Reconstruction for Active Stereo | Seung-Hwan Baek, Felix Heide |
5040 | FBI-Denoiser: Fast Blind Image Denoiser for Poisson-Gaussian Noise | Jaeseok Byun, Sungmin Cha, Taesup Moon |
6136 | Face Forensics in the Wild | Tianfei Zhou, Wenguan Wang, Zhiyuan Liang, Jianbing Shen |
1664 | Exploring Adversarial Fake Images on Face Manifold | Dongze Li, Wei Wang, Hongxing Fan, Jing Dong |
3471 | Pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis | Eric R. Chan, Marco Monteiro, Petr Kellnhofer, Jiajun Wu, Gordon Wetzstein |
5563 | Animating Pictures With Eulerian Motion Fields | Aleksander Holynski, Brian L. Curless, Steven M. Seitz, Richard Szeliski |
4843 | DriveGAN: Towards a Controllable High-Quality Neural Simulation | Seung Wook Kim, Jonah Philion, Antonio Torralba, Sanja Fidler |
8142 | Towards Open World Object Detection | K J Joseph, Salman Khan, Fahad Shahbaz Khan, Vineeth N Balasubramanian |
8778 | DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation | Yufan He, Dong Yang, Holger Roth, Can Zhao, Daguang Xu |
4887 | Siamese Natural Language Tracker: Tracking by Natural Language Descriptions With Siamese Trackers | Qi Feng, Vitaly Ablavsky, Qinxun Bai, Stan Sclaroff |
2551 | Where and What? Examining Interpretable Disentangled Representations | Xinqi Zhu, Chang Xu, Dacheng Tao |
8346 | Prototype Augmentation and Self-Supervision for Incremental Learning | Fei Zhu, Xu-Yao Zhang, Chuang Wang, Fei Yin, Cheng-Lin Liu |
11578 | Brain Image Synthesis With Unsupervised Multivariate Canonical CSCl4Net | Yawen Huang, Feng Zheng, Danyang Wang, Weilin Huang, Matthew R. Scott, Ling Shao |
3386 | Polygonal Building Extraction by Frame Field Learning | Nicolas Girard, Dmitriy Smirnov, Justin Solomon, Yuliya Tarabalka |
10853 | InverseForm: A Loss Function for Structured Boundary-Aware Segmentation | Shubhankar Borse, Ying Wang, Yizhe Zhang, Fatih Porikli |
3473 | SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation | Brendan Duke, Abdalla Ahmed, Christian Wolf, Parham Aarabi, Graham W. Taylor |
5618 | Visual Room Rearrangement | Luca Weihs, Matt Deitke, Aniruddha Kembhavi, Roozbeh Mottaghi |
3931 | A Deep Emulator for Secondary Motion of 3D Characters | Mianlun Zheng, Yi Zhou, Duygu Ceylan, Jernej Barbič |
1582 | Interactive Self-Training With Mean Teachers for Semi-Supervised Object Detection | Qize Yang, Xihan Wei, Biao Wang, Xian-Sheng Hua, Lei Zhang |
4174 | UniT: Unified Knowledge Transfer for Any-Shot Object Detection and Segmentation | Siddhesh Khandelwal, Raghav Goyal, Leonid Sigal |
7446 | Unsupervised Object Detection With LIDAR Clues | Hao Tian, Yuntao Chen, Jifeng Dai, Zhaoxiang Zhang, Xizhou Zhu |
6919 | Implicit Feature Alignment: Learn To Convert Text Recognizer to Text Spotter | Tianwei Wang, Yuanzhi Zhu, Lianwen Jin, Dezhi Peng, Zhe Li, Mengchao He, Yongpan Wang, Canjie Luo |
6806 | Self-Attention Based Text Knowledge Mining for Text Detection | Qi Wan, Haoqin Ji, Linlin Shen |
1441 | Shallow Feature Matters for Weakly Supervised Object Localization | Jun Wei, Qin Wang, Zhen Li, Sheng Wang, S. Kevin Zhou, Shuguang Cui |
1493 | Self-Supervised 3D Mesh Reconstruction From Single Images | Tao Hu, Liwei Wang, Xiaogang Xu, Shu Liu, Jiaya Jia |
5189 | Sketch2Model: View-Aware 3D Modeling From Single Free-Hand Sketches | Song-Hai Zhang, Yuan-Chen Guo, Qing-Wen Gu |
4646 | Learning Parallel Dense Correspondence From Spatio-Temporal Descriptors for Efficient and Robust 4D Reconstruction | Jiapeng Tang, Dan Xu, Kui Jia, Lei Zhang |
2466 | Refer-It-in-RGBD: A Bottom-Up Approach for 3D Visual Grounding in RGBD Images | Haolin Liu, Anran Lin, Xiaoguang Han, Lei Yang, Yizhou Yu, Shuguang Cui |
4139 | VoxelContext-Net: An Octree Based Framework for Point Cloud Compression | Zizheng Que, Guo Lu, Dong Xu |
8215 | CorrNet3D: Unsupervised End-to-End Learning of Dense Correspondence for 3D Point Clouds | Yiming Zeng, Yue Qian, Zhiyu Zhu, Junhui Hou, Hui Yuan, Ying He |
10371 | Inferring CAD Modeling Sequences Using Zone Graphs | Xianghao Xu, Wenzhe Peng, Chin-Yi Cheng, Karl D.D. Willis, Daniel Ritchie |
4307 | Seeing Behind Objects for 3D Multi-Object Tracking in RGB-D Sequences | Norman Müller, Yu-Shiang Wong, Niloy J. Mitra, Angela Dai, Matthias Nießner |
1377 | View Generalization for Single Image Textured 3D Models | Anand Bhattad, Aysegul Dundar, Guilin Liu, Andrew Tao, Bryan Catanzaro |
2543 | A Decomposition Model for Stereo Matching | Chengtang Yao, Yunde Jia, Huijun Di, Pengxiang Li, Yuwei Wu |
10955 | VS-Net: Voting With Segmentation for Visual Localization | Zhaoyang Huang, Han Zhou, Yijin Li, Bangbang Yang, Yan Xu, Xiaowei Zhou, Hujun Bao, Guofeng Zhang, Hongsheng Li |
4615 | MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments From a Single Moving Camera | Felix Wimbauer, Nan Yang, Lukas von Stumberg, Niclas Zeller, Daniel Cremers |
10304 | Shape and Material Capture at Home | Daniel Lichy, Jiaye Wu, Soumyadip Sengupta, David W. Jacobs |
1212 | Offboard 3D Object Detection From Point Cloud Sequences | Charles R. Qi, Yin Zhou, Mahyar Najibi, Pei Sun, Khoa Vo, Boyang Deng, Dragomir Anguelov |
4205 | M3DSSD: Monocular 3D Single Stage Object Detector | Shujie Luo, Hang Dai, Ling Shao, Yong Ding |
1812 | 2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition | Hengduo Li, Zuxuan Wu, Abhinav Shrivastava, Larry S. Davis |
3850 | Deep Analysis of CNN-Based Spatio-Temporal Representations for Action Recognition | Chun-Fu Richard Chen, Rameswar Panda, Kandan Ramakrishnan, Rogerio Feris, John Cohn, Aude Oliva, Quanfu Fan |
2266 | The Blessings of Unlabeled Background in Untrimmed Videos | Yuan Liu, Jingyuan Chen, Zhenfang Chen, Bing Deng, Jianqiang Huang, Hanwang Zhang |
1767 | PointGuard: Provably Robust 3D Point Cloud Classification | Hongbin Liu, Jinyuan Jia, Neil Zhenqiang Gong |
7898 | DSRNA: Differentiable Search of Robust Neural Architectures | Ramtin Hosseini, Xingyi Yang, Pengtao Xie |
4844 | Backdoor Attacks Against Deep Learning Systems in the Physical World | Emily Wenger, Josephine Passananti, Arjun Nitin Bhagoji, Yuanshun Yao, Haitao Zheng, Ben Y. Zhao |
4971 | Riggable 3D Face Reconstruction via In-Network Optimization | Ziqian Bai, Zhaopeng Cui, Xiaoming Liu, Ping Tan |
2678 | NeuralHumanFVV: Real-Time Neural Volumetric Human Performance Rendering Using RGB Cameras | Xin Suo, Yuheng Jiang, Pei Lin, Yingliang Zhang, Minye Wu, Kaiwen Guo, Lan Xu |
1439 | Context Modeling in 3D Human Pose Estimation: A Unified Perspective | Xiaoxuan Ma, Jiajun Su, Chunyu Wang, Hai Ci, Yizhou Wang |
1162 | Dive Into Ambiguity: Latent Distribution Mining and Pairwise Uncertainty Estimation for Facial Expression Recognition | Jiahui She, Yibo Hu, Hailin Shi, Jun Wang, Qiu Shen, Tao Mei |
10601 | Lifting 2D StyleGAN for 3D-Aware Face Generation | Yichun Shi, Divyansh Aggarwal, Anil K. Jain |
3006 | Hybrid Message Passing With Performance-Driven Structures for Facial Action Unit Detection | Tengfei Song, Zijun Cui, Wenming Zheng, Qiang Ji |
837 | Learning to Generalize Unseen Domains via Memory-based Multi-Source Meta-Learning for Person Re-Identification | Yuyang Zhao, Zhun Zhong, Fengxiang Yang, Zhiming Luo, Yaojin Lin, Shaozi Li, Nicu Sebe |
3736 | Invertible Image Signal Processing | Yazhou Xing, Zian Qian, Qifeng Chen |
5579 | End-to-End High Dynamic Range Camera Pipeline Optimization | Nicolas Robidoux, Luis E. García Capel, Dong-eun Seo, Avinash Sharma, Federico Ariza, Felix Heide |
3087 | Blind Deblurring for Saturated Images | Liang Chen, Jiawei Zhang, Songnan Lin, Faming Fang, Jimmy S. Ren |
11535 | Extreme Low-Light Environment-Driven Image Denoising Over Permanently Shadowed Lunar Regions With a Physical Noise Model | Ben Moseley, Valentin Bickel, Ignacio G. López-Francos, Loveneesh Rana |
8642 | Controlling the Rain: From Removal to Rendering | Siqi Ni, Xueyun Cao, Tao Yue, Xuemei Hu |
340 | De-Rendering the World’s Revolutionary Artefacts | Shangzhe Wu, Ameesh Makadia, Jiajun Wu, Noah Snavely, Richard Tucker, Angjoo Kanazawa |
4703 | Progressively Complementary Network for Fisheye Image Rectification Using Appearance Flow | Shangrong Yang, Chunyu Lin, Kang Liao, Chunjie Zhang, Yao Zhao |
4442 | High-Speed Image Reconstruction Through Short-Term Plasticity for Spiking Cameras | Yajing Zheng, Lingxiao Zheng, Zhaofei Yu, Boxin Shi, Yonghong Tian, Tiejun Huang |
886 | MASA-SR: Matching Acceleration and Spatial Adaptation for Reference-Based Image Super-Resolution | Liying Lu, Wenbo Li, Xin Tao, Jiangbo Lu, Jiaya Jia |
6315 | Single Pair Cross-Modality Super Resolution | Guy Shacht, Dov Danon, Sharon Fogel, Daniel Cohen-Or |
3475 | Temporal Modulation Network for Controllable Space-Time Video Super-Resolution | Gang Xu, Jun Xu, Zhen Li, Liang Wang, Xing Sun, Ming-Ming Cheng |
3446 | The Multi-Temporal Urban Development SpaceNet Dataset | Adam Van Etten, Daniel Hogan, Jesus Martinez Manso, Jacob Shermeyer, Nicholas Weir, Ryan Lewis |
9937 | Euro-PVI: Pedestrian Vehicle Interactions in Dense Urban Centers | Apratim Bhattacharyya, Daniel Olmeda Reino, Mario Fritz, Bernt Schiele |
9933 | AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling | Dilin Wang, Meng Li, Chengyue Gong, Vikas Chandra |
1462 | Learning Student Networks in the Wild | Hanting Chen, Tianyu Guo, Chang Xu, Wenshuo Li, Chunjing Xu, Chao Xu, Yunhe Wang |
2617 | Towards Compact CNNs via Collaborative Compression | Yuchao Li, Shaohui Lin, Jianzhuang Liu, Qixiang Ye, Mengdi Wang, Fei Chao, Fan Yang, Jincheng Ma, Qi Tian, Rongrong Ji |
4631 | Network Quantization With Element-Wise Gradient Scaling | Junghyup Lee, Dohyung Kim, Bumsub Ham |
10966 | Frequency-Aware Discriminative Feature Learning Supervised by Single-Center Loss for Face Forgery Detection | Jiaming Li, Hongtao Xie, Jiahong Li, Zhongyuan Wang, Yongdong Zhang |
11528 | Building Reliable Explanations of Unreliable Neural Networks: Locally Smoothing Perspective of Model Interpretation | Dohun Lim, Hyeonseok Lee, Sungchan Kim |
5056 | Perceptual Indistinguishability-Net (PI-Net): Facial Image Obfuscation With Manipulable Semantics | Jia-Wei Chen, Li-Ju Chen, Chia-Mu Yu, Chun-Shien Lu |
3915 | Coming Down to Earth: Satellite-to-Street View Synthesis for Geo-Localization | Aysim Toker, Qunjie Zhou, Maxim Maximov, Laura Leal-Taixé |
1264 | Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes | Zhengqi Li, Simon Niklaus, Noah Snavely, Oliver Wang |
7778 | Not Just Compete, but Collaborate: Local Image-to-Image Translation via Cooperative Mask Prediction | Daejin Kim, Mohammad Azam Khan, Jaegul Choo |
6651 | Posterior Promoted GAN With Distribution Discriminator for Unsupervised Image Synthesis | Xianchao Zhang, Ziyang Cheng, Xiaotong Zhang, Han Liu |
1519 | Surrogate Gradient Field for Latent Space Manipulation | Minjun Li, Yanghua Jin, Huachun Zhu |
4279 | Image Inpainting Guided by Coherence Priors of Semantics and Textures | Liang Liao, Jing Xiao, Zheng Wang, Chia-Wen Lin, Shin'ichi Satoh |
3233 | Spatially-Invariant Style-Codes Controlled Makeup Transfer | Han Deng, Chu Han, Hongmin Cai, Guoqiang Han, Shengfeng He |
7856 | Memory-Guided Unsupervised Image-to-Image Translation | Somi Jeong, Youngjung Kim, Eungbean Lee, Kwanghoon Sohn |
5568 | Hierarchical Motion Understanding via Motion Programs | Sumith Kulal, Jiayuan Mao, Alex Aiken, Jiajun Wu |
11574 | Adaptive Rank Estimate in Robust Principal Component Analysis | Zhengqin Xu, Rui He, Shoulie Xie, Shiqian Wu |
1500 | Deep Animation Video Interpolation in the Wild | Li Siyao, Shiyu Zhao, Weijiang Yu, Wenxiu Sun, Dimitris Metaxas, Chen Change Loy, Ziwei Liu |
6889 | ECKPN: Explicit Class Knowledge Propagation Network for Transductive Few-Shot Learning | Chaofan Chen, Xiaoshan Yang, Changsheng Xu, Xuhui Huang, Zhe Ma |
4337 | Multi-Objective Interpolation Training for Robustness To Label Noise | Diego Ortego, Eric Arazo, Paul Albert, Noel E. O'Connor, Kevin McGuinness |
8295 | T-vMF Similarity for Regularizing Intra-Class Feature Distribution | Takumi Kobayashi |
9891 | Disentangling Label Distribution for Long-Tailed Visual Recognition | Youngkyu Hong, Seungju Han, Kwanghee Choi, Seokjun Seo, Beomsu Kim, Buru Chang |
2124 | Leveraging the Availability of Two Cameras for Illuminant Estimation | Abdelrahman Abdelhamed, Abhijith Punnappurath, Michael S. Brown |
665 | Decoupled Dynamic Filter Networks | Jingkai Zhou, Varun Jampani, Zhixiong Pi, Qiong Liu, Ming-Hsuan Yang |
4042 | Rethinking Graph Neural Architecture Search From Message-Passing | Shaofei Cai, Liang Li, Jincan Deng, Beichen Zhang, Zheng-Jun Zha, Li Su, Qingming Huang |
7833 | Towards Improving the Consistency, Efficiency, and Flexibility of Differentiable Neural Architecture Search | Yibo Yang, Shan You, Hongyang Li, Fei Wang, Chen Qian, Zhouchen Lin |
3445 | Unsupervised Visual Attention and Invariance for Reinforcement Learning | Xudong Wang, Long Lian, Stella X. Yu |
7331 | Mol2Image: Improved Conditional Flow Models for Molecule to Image Synthesis | Karren Yang, Samuel Goldman, Wengong Jin, Alex X. Lu, Regina Barzilay, Tommi Jaakkola, Caroline Uhler |
11819 | TSGCNet: Discriminative Geometric Feature Learning With Two-Stream Graph Convolutional Network for 3D Dental Model Segmentation | Lingming Zhang, Yue Zhao, Deyu Meng, Zhiming Cui, Chenqiang Gao, Xinbo Gao, Chunfeng Lian, Dinggang Shen |
4215 | IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking | Shuai Jia, Yibing Song, Chao Ma, Xiaokang Yang |
5594 | GMOT-40: A Benchmark for Generic Multiple Object Tracking | Hexin Bai, Wensheng Cheng, Peng Chu, Juehuan Liu, Kai Zhang, Haibin Ling |
5806 | Combined Depth Space Based Architecture Search for Person Re-Identification | Hanjun Li, Gaojie Wu, Wei-Shi Zheng |
11627 | Learning an Explicit Weighting Scheme for Adapting Complex HSI Noise | Xiangyu Rui, Xiangyong Cao, Qi Xie, Zongsheng Yue, Qian Zhao, Deyu Meng |
2504 | VaB-AL: Incorporating Class Imbalance and Difficulty With Variational Bayes for Active Learning | Jongwon Choi, Kwang Moo Yi, Jihoon Kim, Jinho Choo, Byoungjip Kim, Jinyeop Chang, Youngjune Gwon, Hyung Jin Chang |
8156 | Learning a Facial Expression Embedding Disentangled From Identity | Wei Zhang, Xianpeng Ji, Keyu Chen, Yu Ding, Changjie Fan |
2939 | SRDAN: Scale-Aware and Range-Aware Domain Adaptation Network for Cross-Dataset 3D Object Detection | Weichen Zhang, Wen Li, Dong Xu |
5079 | Regressive Domain Adaptation for Unsupervised Keypoint Detection | Junguang Jiang, Yifei Ji, Ximei Wang, Yufeng Liu, Jianmin Wang, Mingsheng Long |
4340 | Uncertainty-Guided Model Generalization to Unseen Domains | Fengchun Qiao, Xi Peng |
3342 | Self-Promoted Prototype Refinement for Few-Shot Class-Incremental Learning | Kai Zhu, Yang Cao, Wei Zhai, Jie Cheng, Zheng-Jun Zha |
3079 | Noise-Resistant Deep Metric Learning With Ranking-Based Instance Selection | Chang Liu, Han Yu, Boyang Li, Zhiqi Shen, Zhanning Gao, Peiran Ren, Xuansong Xie, Lizhen Cui, Chunyan Miao |
5729 | DAT: Training Deep Networks Robust To Label-Noise by Matching the Feature Distributions | Yuntao Qu, Shasha Mo, Jianwei Niu |
7159 | OBoW: Online Bag-of-Visual-Words Generation for Self-Supervised Learning | Spyros Gidaris, Andrei Bursuc, Gilles Puy, Nikos Komodakis, Matthieu Cord, Patrick Pérez |
5169 | Learning Affinity-Aware Upsampling for Deep Image Matting | Yutong Dai, Hao Lu, Chunhua Shen |
8373 | PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View Depth Estimation With Neural Positional Encoding and Distilled Matting Loss | Juan Luis Gonzalez, Munchurl Kim |
2649 | RefineMask: Towards High-Quality Instance Segmentation With Fine-Grained Features | Gang Zhang, Xin Lu, Jingru Tan, Jianmin Li, Zhaoxiang Zhang, Quanquan Li, Xiaolin Hu |
6116 | CompositeTasking: Understanding Images by Spatial Composition of Tasks | Nikola Popović, Danda Pani Paudel, Thomas Probst, Guolei Sun, Luc Van Gool |
1035 | Rethinking Semantic Segmentation From a Sequence-to-Sequence Perspective With Transformers | Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip H.S. Torr, Li Zhang |
3168 | FSDR: Frequency Space Domain Randomization for Domain Generalization | Jiaxing Huang, Dayan Guan, Aoran Xiao, Shijian Lu |
2838 | Transformation Driven Visual Reasoning | Xin Hong, Yanyan Lan, Liang Pang, Jiafeng Guo, Xueqi Cheng |
4630 | Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation | Youngmin Oh, Beomjun Kim, Bumsub Ham |
8453 | Adaptive Consistency Regularization for Semi-Supervised Transfer Learning | Abulikemu Abuduweili, Xingjian Li, Humphrey Shi, Cheng-Zhong Xu, Dejing Dou |
2332 | Self-Generated Defocus Blur Detection via Dual Adversarial Discriminators | Wenda Zhao, Cai Shang, Huchuan Lu |
1397 | Ego-Exo: Transferring Visual Representations From Third-Person to First-Person Videos | Yanghao Li, Tushar Nagarajan, Bo Xiong, Kristen Grauman |
3191 | PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds | Yi Wei, Ziyi Wang, Yongming Rao, Jiwen Lu, Jie Zhou |
3876 | Spatiotemporal Contrastive Video Representation Learning | Rui Qian, Tianjian Meng, Boqing Gong, Ming-Hsuan Yang, Huisheng Wang, Serge Belongie, Yin Cui |
1866 | Deep Video Matting via Spatio-Temporal Alignment and Aggregation | Yanan Sun, Guanzhi Wang, Qiao Gu, Chi-Keung Tang, Yu-Wing Tai |
5988 | Target-Aware Object Discovery and Association for Unsupervised Video Multi-Object Segmentation | Tianfei Zhou, Jianwu Li, Xueyi Li, Ling Shao |
10384 | Multimodal Contrastive Training for Visual Representation Learning | Xin Yuan, Zhe Lin, Jason Kuen, Jianming Zhang, Yilin Wang, Michael Maire, Ajinkya Kale, Baldo Faieta |
5443 | Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs | Xudong Lin, Gedas Bertasius, Jue Wang, Shih-Fu Chang, Devi Parikh, Lorenzo Torresani |
3368 | Distilling Audio-Visual Knowledge by Compositional Contrastive Learning | Yanbei Chen, Yongqin Xian, A. Sophia Koepke, Ying Shan, Zeynep Akata |
2292 | Structured Multi-Level Interaction Network for Video Moment Localization via Language Query | Hao Wang, Zheng-Jun Zha, Liang Li, Dong Liu, Jiebo Luo |
2590 | Scene-Intuitive Agent for Remote Embodied Visual Grounding | Xiangru Lin, Guanbin Li, Yizhou Yu |
1226 | Domain-Robust VQA With Diverse Datasets and Methods but No Target Labels | Mingda Zhang, Tristan Maidment, Ahmad Diab, Adriana Kovashka, Rebecca Hwa |
3662 | Composing Photos Like a Photographer | Chaoyi Hong, Shuaiyuan Du, Ke Xian, Hao Lu, Zhiguo Cao, Weicai Zhong |
8331 | Dogfight: Detecting Drones From Drones Videos | Muhammad Waseem Ashraf, Waqas Sultani, Mubarak Shah |
427 | Multi-Modal Fusion Transformer for End-to-End Autonomous Driving | Aditya Prakash, Kashyap Chitta, Andreas Geiger |
4018 | Cloud2Curve: Generation and Vectorization of Parametric Sketches | Ayan Das, Yongxin Yang, Timothy M. Hospedales, Tao Xiang, Yi-Zhe Song |