MAIN CONFERENCE
All papers will be presented in the same manner. Each paper will have a five minute pre-recorded video and a PDF of the poster. An asynchronous text chat will be available for each paper. Attendees can view the papers and videos on demand at any time. Authors will also have individual Q&A sessions at the posted times below.
All posted times are EDT but the chart linked below has all time zones’ conversions. When the virtual site is up, you will be able to select which sessions you are interested in and it will populate your own schedule.
Presentation Schedule
-
All times are Eastern Daylight Time
Date: Thursday, June 24, 2021 11:00 – 13:30
Paper Session Ten:
Paper ID | Paper Title | Authors |
6333 | Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos | Yasamin Jafarian, Hyun Soo Park |
4375 | PointNetLK Revisited | Xueqian Li, Jhony Kaesemodel Pontes, Simon Lucey |
4318 | BRepNet: A Topological Message Passing System for Solid Models | Joseph G. Lambourne, Karl D.D. Willis, Pradeep Kumar Jayaraman, Aditya Sanghi, Peter Meltzer, Hooman Shayani |
7395 | KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control | Tomas Jakab, Richard Tucker, Ameesh Makadia, Jiajun Wu, Noah Snavely, Angjoo Kanazawa |
2408 | Learning View-Disentangled Human Pose Representation by Contrastive Cross-View Mutual Information Maximization | Long Zhao, Yuxiao Wang, Jiaping Zhao, Liangzhe Yuan, Jennifer J. Sun, Florian Schroff, Hartwig Adam, Xi Peng, Dimitris Metaxas, Ting Liu |
1927 | i3DMM: Deep Implicit 3D Morphable Model of Human Heads | Tarun Yenamandra, Ayush Tewari, Florian Bernard, Hans-Peter Seidel, Mohamed Elgharib, Daniel Cremers, Christian Theobalt |
3521 | Reconstructing 3D Human Pose by Watching Humans in the Mirror | Qi Fang, Qing Shuai, Junting Dong, Hujun Bao, Xiaowei Zhou |
1539 | EventZoom: Learning To Denoise and Super Resolve Neuromorphic Events | Peiqi Duan, Zihao W. Wang, Xinyu Zhou, Yi Ma, Boxin Shi |
3102 | Spatially-Varying Outdoor Lighting Estimation From Intrinsics | Yongjie Zhu, Yinda Zhang, Si Li, Boxin Shi |
5385 | Knowledge Evolution in Neural Networks | Ahmed Taha, Abhinav Shrivastava, Larry S. Davis |
10484 | Understanding Failures of Deep Networks via Robust Feature Extraction | Sahil Singla, Besmira Nushi, Shital Shah, Ece Kamar, Eric Horvitz |
5532 | StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation | Zongze Wu, Dani Lischinski, Eli Shechtman |
4270 | Taming Transformers for High-Resolution Image Synthesis | Patrick Esser, Robin Rombach, Björn Ommer |
7246 | Benchmarking Representation Learning for Natural World Image Collections | Grant Van Horn, Elijah Cole, Sara Beery, Kimberly Wilber, Serge Belongie, Oisin Mac Aodha |
10649 | Scaling Local Self-Attention for Parameter Efficient Visual Backbones | Ashish Vaswani, Prajit Ramachandran, Aravind Srinivas, Niki Parmar, Blake Hechtman, Jonathon Shlens |
10078 | IMODAL: Creating Learnable User-Defined Deformation Models | Leander Lacroix, Benjamin Charlier, Alain Trouvé, Barbara Gris |
3008 | Unsupervised Multi-Source Domain Adaptation for Person Re-Identification | Zechen Bai, Zhigang Wang, Jian Wang, Di Hu, Errui Ding |
5556 | Generalization on Unseen Domains via Inference-Time Label-Preserving Target Projections | Prashant Pandey, Mrigank Raman, Sumanth Varambally, Prathosh AP |
2616 | Robust Audio-Visual Instance Discrimination | Pedro Morgado, Ishan Misra, Nuno Vasconcelos |
1269 | Binary TTC: A Temporal Geofence for Autonomous Navigation | Abhishek Badki, Orazio Gallo, Jan Kautz, Pradeep Sen |
800 | LED2-Net: Monocular 360° Layout Estimation via Differentiable Depth Rendering | Fu-En Wang, Yu-Hsuan Yeh, Min Sun, Wei-Chen Chiu, Yi-Hsuan Tsai |
2062 | A Realistic Evaluation of Semi-Supervised Learning for Fine-Grained Classification | Jong-Chyi Su, Zezhou Cheng, Subhransu Maji |
2425 | Seeing Out of the Box: End-to-End Pre-Training for Vision-Language Representation Learning | Zhicheng Huang, Zhaoyang Zeng, Yupan Huang, Bei Liu, Dongmei Fu, Jianlong Fu |
2757 | Intentonomy: A Dataset and Study Towards Human Intent Understanding | Menglin Jia, Zuxuan Wu, Austin Reiter, Claire Cardie, Serge Belongie, Ser-Nam Lim |
1178 | Mutual Graph Learning for Camouflaged Object Detection | Qiang Zhai, Xin Li, Fan Yang, Chenglizhao Chen, Hong Cheng, Deng-Ping Fan |
10700 | Hallucination Improves Few-Shot Object Detection | Weilin Zhang, Yu-Xiong Wang |
7685 | Learning To Predict Visual Attributes in the Wild | Khoi Pham, Kushal Kafle, Zhe Lin, Zhihong Ding, Scott Cohen, Quan Tran, Abhinav Shrivastava |
745 | Scaled-YOLOv4: Scaling Cross Stage Partial Network | Chien-Yao Wang, Alexey Bochkovskiy, Hong-Yuan Mark Liao |
2620 | You Only Look One-Level Feature | Qiang Chen, Yingming Wang, Tong Yang, Xiangyu Zhang, Jian Cheng, Jian Sun |
3884 | Neighborhood Normalization for Robust Geometric Feature Learning | Xingtong Liu, Benjamin D. Killeen, Ayushi Sinha, Masaru Ishii, Gregory D. Hager, Russell H. Taylor, Mathias Unberath |
2987 | High-Fidelity Face Tracking for AR/VR via Deep Lighting Adaptation | Lele Chen, Chen Cao, Fernando De la Torre, Jason Saragih, Chenliang Xu, Yaser Sheikh |
6689 | Cuboids Revisited: Learning Robust 3D Shape Fitting to Single RGB Images | Florian Kluger, Hanno Ackermann, Eric Brachmann, Michael Ying Yang, Bodo Rosenhahn |
587 | Cycle4Completion: Unpaired Point Cloud Completion Using Cycle Transformation With Missing Region Coding | Xin Wen, Zhizhong Han, Yan-Pei Cao, Pengfei Wan, Wen Zheng, Yu-Shen Liu |
3093 | LiDAR-Based Panoptic Segmentation via Dynamic Shifting Network | Fangzhou Hong, Hui Zhou, Xinge Zhu, Hongsheng Li, Ziwei Liu |
6286 | RPSRNet: End-to-End Trainable Rigid Point Set Registration Network Using Barnes-Hut 2D-Tree Representation | Sk Aziz Ali, Kerem Kahraman, Gerd Reis, Didier Stricker |
2397 | Online Learning of a Probabilistic and Adaptive Scene Representation | Zike Yan, Xin Wang, Hongbin Zha |
1679 | Quantum Permutation Synchronization | Tolga Birdal, Vladislav Golyanik, Christian Theobalt, Leonidas J. Guibas |
375 | Wide-Baseline Multi-Camera Calibration Using Person Re-Identification | Yan Xu, Yu-Jhe Li, Xinshuo Weng, Kris Kitani |
10046 | STaR: Self-Supervised Tracking and Reconstruction of Rigid Objects in Motion With Neural Rendering | Wentao Yuan, Zhaoyang Lv, Tanner Schmidt, Steven Lovegrove |
10321 | PatchMatch-Based Neighborhood Consensus for Semantic Correspondence | Jae Yong Lee, Joseph DeGol, Victor Fragoso, Sudipta N. Sinha |
1038 | Learning Feature Aggregation for Deep 3D Morphable Models | Zhixiang Chen, Tae-Kyun Kim |
7342 | A Functional Approach to Rotation Equivariant Non-Linearities for Tensor Field Networks. | Adrien Poulenard, Leonidas J. Guibas |
8801 | Generalizing to the Open World: Deep Visual Odometry With Online Adaptation | Shunkai Li, Xin Wu, Yingdian Cao, Hongbin Zha |
6665 | Panoptic-PolarNet: Proposal-Free LiDAR Point Cloud Panoptic Segmentation | Zixiang Zhou, Yang Zhang, Hassan Foroosh |
1136 | 3D Spatial Recognition Without Spatially Labeled 3D | Zhongzheng Ren, Ishan Misra, Alexander G. Schwing, Rohit Girdhar |
4240 | ACTION-Net: Multipath Excitation for Action Recognition | Zhengwei Wang, Qi She, Aljosa Smolic |
7689 | Anticipating Human Actions by Correlating Past With the Future With Jaccard Similarity Measures | Basura Fernando, Samitha Herath |
5941 | Glance and Gaze: Inferring Action-Aware Points for One-Stage Human-Object Interaction Detection | Xubin Zhong, Xian Qu, Changxing Ding, Dacheng Tao |
5952 | How Robust Are Randomized Smoothing Based Defenses to Data Poisoning? | Akshay Mehra, Bhavya Kailkhura, Pin-Yu Chen, Jihun Hamm |
5477 | FaceSec: A Fine-Grained Robustness Evaluation Framework for Face Recognition Systems | Liang Tong, Zhengzhang Chen, Jingchao Ni, Wei Cheng, Dongjin Song, Haifeng Chen, Yevgeniy Vorobeychik |
3534 | Rethinking the Heatmap Regression for Bottom-Up Human Pose Estimation | Zhengxiong Luo, Zhicheng Wang, Yan Huang, Liang Wang, Tieniu Tan, Erjin Zhou |
5102 | Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration | Xingyu Chen, Yufeng Liu, Chongyang Ma, Jianlong Chang, Huayan Wang, Tian Chen, Xiaoyan Guo, Pengfei Wan, Wen Zheng |
5498 | S3: Neural Shape, Skeleton, and Skinning Fields for 3D Human Modeling | Ze Yang, Shenlong Wang, Sivabalan Manivasagam, Zeng Huang, Wei-Chiu Ma, Xinchen Yan, Ersin Yumer, Raquel Urtasun |
5404 | CanonPose: Self-Supervised Monocular 3D Human Pose Estimation in the Wild | Bastian Wandt, Marco Rudolph, Petrissa Zell, Helge Rhodin, Bodo Rosenhahn |
5284 | Lipstick Ain’t Enough: Beyond Color Matching for In-the-Wild Makeup Transfer | Thao Nguyen, Anh Tuan Tran, Minh Hoai |
1804 | Virtual Fully-Connected Layer: Training a Large-Scale Face Recognition Dataset With Limited Computational Resources | Pengyu Li, Biao Wang, Lei Zhang |
3673 | Learning From the Master: Distilling Cross-Modal Advanced Knowledge for Lip Reading | Sucheng Ren, Yong Du, Jianming Lv, Guoqiang Han, Shengfeng He |
4504 | Watching You: Global-Guided Reciprocal Learning for Video-Based Person Re-Identification | Xuehu Liu, Pingping Zhang, Chenyang Yu, Huchuan Lu, Xiaoyun Yang |
11579 | Sparse Multi-Path Corrections in Fringe Projection Profilometry | Yu Zhang, Daniel Lau, David Wipf |
3126 | Attention-Guided Image Compression by Deep Reconstruction of Compressive Sensed Saliency Skeleton | Xi Zhang, Xiaolin Wu |
603 | Invertible Denoising Network: A Light Solution for Real Noise Removal | Yang Liu, Zhenyue Qin, Saeed Anwar, Pan Ji, Dongwoo Kim, Sabrina Caldwell, Tom Gedeon |
1473 | Multi-Decoding Deraining Network and Quasi-Sparsity Based Training | Yinglong Wang, Chao Ma, Bing Zeng |
3690 | Unsupervised Real-World Image Super Resolution via Domain-Distance Aware Training | Yunxuan Wei, Shuhang Gu, Yawei Li, Radu Timofte, Longcun Jin, Hengjie Song |
2424 | Single Image Reflection Removal With Absorption Effect | Qian Zheng, Boxin Shi, Jinnan Chen, Xudong Jiang, Ling-Yu Duan, Alex C. Kot |
5944 | Exploiting Aliasing for Manga Restoration | Minshan Xie, Menghan Xia, Tien-Tsin Wong |
5908 | Rich Context Aggregation With Reflection Prior for Glass Surface Detection | Jiaying Lin, Zebang He, Rynson W.H. Lau |
2140 | MR Image Super-Resolution With Squeeze and Excitation Reasoning Attention Network | Yulun Zhang, Kai Li, Kunpeng Li, Yun Fu |
5684 | Rich Features for Perceptual Quality Assessment of UGC Videos | Yilin Wang, Junjie Ke, Hossein Talebi, Joong Gon Yim, Neil Birkbeck, Balu Adsumilli, Peyman Milanfar, Feng Yang |
5317 | A 3D GAN for Improved Large-Pose Facial Recognition | Richard T. Marriott, Sami Romdhani, Liming Chen |
5433 | Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark | Joakim Bruslund Haurum, Thomas B. Moeslund |
6109 | AGORA: Avatars in Geography Optimized for Regression Analysis | Priyanka Patel, Chun-Hao P. Huang, Joachim Tesch, David T. Hoffmann, Shashank Tripathi, Michael J. Black |
8582 | SKFAC: Training Neural Networks With Faster Kronecker-Factored Approximate Curvature | Zedong Tang, Fenlong Jiang, Maoguo Gong, Hao Li, Yue Wu, Fan Yu, Zidong Wang, Min Wang |
6628 | Tree-Like Decision Distillation | Jie Song, Haofei Zhang, Xinchao Wang, Mengqi Xue, Ying Chen, Li Sun, Dacheng Tao, Mingli Song |
7774 | How Does Topology Influence Gradient Propagation and Model Performance of Deep Networks With DenseNet-Type Skip Connections? | Kartikeya Bhardwaj, Guihong Li, Radu Marculescu |
3422 | EnD: Entangling and Disentangling Deep Representations for Bias Correction | Enzo Tartaglione, Carlo Alberto Barbano, Marco Grangetto |
5223 | Learning Decision Trees Recurrently Through Communication | Stephan Alaniz, Diego Marcos, Bernt Schiele, Zeynep Akata |
4928 | Neural Response Interpretation Through the Lens of Critical Pathways | Ashkan Khakzar, Soroosh Baselizadeh, Saurabh Khanduja, Christian Rupprecht, Seong Tae Kim, Nassir Navab |
7469 | Masksembles for Uncertainty Estimation | Nikita Durasov, Timur Bagautdinov, Pierre Baque, Pascal Fua |
3170 | Self-Supervised Video Hashing via Bidirectional Transformers | Shuyan Li, Xiu Li, Jiwen Lu, Jie Zhou |
8530 | 3D Shape Generation With Grid-Based Implicit Functions | Moritz Ibing, Isaak Lim, Leif Kobbelt |
1512 | Positional Encoding As Spatial Inductive Bias in GANs | Rui Xu, Xintao Wang, Kai Chen, Bolei Zhou, Chen Change Loy |
8688 | Blur, Noise, and Compression Robust Generative Adversarial Networks | Takuhiro Kaneko, Tatsuya Harada |
4909 | Learning by Planning: Language-Guided Global Image Editing | Jing Shi, Ning Xu, Yihang Xu, Trung Bui, Franck Dernoncourt, Chenliang Xu |
531 | Teachers Do More Than Teach: Compressing Image-to-Image Models | Qing Jin, Jian Ren, Oliver J. Woodford, Jiazhuo Wang, Geng Yuan, Yanzhi Wang, Sergey Tulyakov |
5205 | Autoregressive Stylized Motion Synthesis With Generative Flow | Yu-Hui Wen, Zhipeng Yang, Hongbo Fu, Lin Gao, Yanan Sun, Yong-Jin Liu |
1325 | MUST-GAN: Multi-Level Statistics Transfer for Self-Driven Person Image Generation | Tianxiang Ma, Bo Peng, Wei Wang, Jing Dong |
1834 | House-GAN++: Generative Adversarial Layout Refinement Network towards Intelligent Computational Agent for Professional Architects | Nelson Nauata, Sepidehsadat Hosseini, Kai-Hung Chang, Hang Chu, Chin-Yi Cheng, Yasutaka Furukawa |
1234 | Variational Transformer Networks for Layout Generation | Diego Martín Arroyo, Janis Postels, Federico Tombari |
11392 | Motion Representations for Articulated Animation | Aliaksandr Siarohin, Oliver J. Woodford, Jian Ren, Menglei Chai, Sergey Tulyakov |
10917 | Pareto Self-Supervised Training for Few-Shot Learning | Zhengyu Chen, Jixie Ge, Heshen Zhan, Siteng Huang, Donglin Wang |
3045 | RaScaNet: Learning Tiny Models by Raster-Scanning Images | Jaehyoung Yoo, Dongwook Lee, Changyong Son, Sangil Jung, ByungIn Yoo, Changkyu Choi, Jae-Joon Han, Bohyung Han |
7379 | AlphaMatch: Improving Consistency for Semi-Supervised Learning With Alpha-Divergence | Chengyue Gong, Dilin Wang, Qiang Liu |
4001 | Nearest Neighbor Matching for Deep Clustering | Zhiyuan Dang, Cheng Deng, Xu Yang, Kun Wei, Heng Huang |
1985 | DeepACG: Co-Saliency Detection via Semantic-Aware Contrast Gromov-Wasserstein Distance | Kaihua Zhang, Mingliang Dong, Bo Liu, Xiao-Tong Yuan, Qingshan Liu |
2807 | Coordinate Attention for Efficient Mobile Network Design | Qibin Hou, Daquan Zhou, Jiashi Feng |
1157 | Landmark Regularization: Ranking Guided Super-Net Training in Neural Architecture Search | Kaicheng Yu, René Ranftl, Mathieu Salzmann |
7140 | RepVGG: Making VGG-Style ConvNets Great Again | Xiaohan Ding, Xiangyu Zhang, Ningning Ma, Jungong Han, Guiguang Ding, Jian Sun |
3841 | 3D Graph Anatomy Geometry-Integrated Network for Pancreatic Mass Segmentation, Diagnosis, and Quantitative Patient Management | Tianyi Zhao, Kai Cao, Jiawen Yao, Isabella Nogues, Le Lu, Lingyun Huang, Jing Xiao, Zhaozheng Yin, Ling Zhang |
789 | Exploring and Distilling Posterior and Prior Knowledge for Radiology Report Generation | Fenglin Liu, Xian Wu, Shen Ge, Wei Fan, Yuexian Zou |
1435 | Towards More Flexible and Accurate Object Tracking With Natural Language: Algorithms and Benchmark | Xiao Wang, Xiujun Shu, Zhipeng Zhang, Bo Jiang, Yaowei Wang, Yonghong Tian, Feng Wu |
7848 | STMTrack: Template-Free Visual Tracking With Space-Time Memory Networks | Zhihong Fu, Qingjie Liu, Zehua Fu, Yunhong Wang |
6241 | DyGLIP: A Dynamic Graph Model With Link Prediction for Accurate Multi-Camera Multiple Object Tracking | Kha Gia Quach, Pha Nguyen, Huu Le, Thanh-Dat Truong, Chi Nhan Duong, Minh-Triet Tran, Khoa Luu |
3387 | SuperMix: Supervising the Mixing Data Augmentation | Ali Dabouei, Sobhan Soleymani, Fariborz Taherkhani, Nasser M. Nasrabadi |
1924 | Monte Carlo Scene Search for 3D Scene Understanding | Shreyas Hampali, Sinisa Stekovic, Sayan Deb Sarkar, Chetan S. Kumar, Friedrich Fraundorfer, Vincent Lepetit |
6437 | MAZE: Data-Free Model Stealing Attack Using Zeroth-Order Gradient Estimation | Sanjay Kariyappa, Atul Prakash, Moinuddin K Qureshi |
955 | Visualizing Adapted Knowledge in Domain Transfer | Yunzhong Hou, Liang Zheng |
3431 | Prototypical Cross-Domain Self-Supervised Learning for Few-Shot Unsupervised Domain Adaptation | Xiangyu Yue, Zangwei Zheng, Shanghang Zhang, Yang Gao, Trevor Darrell, Kurt Keutzer, Alberto Sangiovanni Vincentelli |
10462 | KSM: Fast Multiple Task Adaption via Kernel-Wise Soft Mask Learning | Li Yang, Zhezhi He, Junshan Zhang, Deliang Fan |
1510 | Picasso: A CUDA-Based Library for Deep Learning Over 3D Meshes | Huan Lei, Naveed Akhtar, Ajmal Mian |
6646 | Efficient Feature Transformations for Discriminative and Generative Continual Learning | Vinay Kumar Verma, Kevin J Liang, Nikhil Mehta, Piyush Rai, Lawrence Carin |
8858 | Spatial Assembly Networks for Image Representation Learning | Yang Li, Shichao Kan, Jianhe Yuan, Wenming Cao, Zhihai He |
1846 | Self-Supervised Video Representation Learning by Context and Motion Decoupling | Lianghua Huang, Yu Liu, Bin Wang, Pan Pan, Yinghui Xu, Rong Jin |
2036 | Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression | Wanhua Li, Xiaoke Huang, Jiwen Lu, Jianjiang Feng, Jie Zhou |
1728 | CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching | Zhelun Shen, Yuchao Dai, Zhibo Rao |
10892 | Depth Completion Using Plane-Residual Representation | Byeong-Uk Lee, Kyunghyun Lee, In So Kweon |
4697 | Look Closer To Segment Better: Boundary Patch Refinement for Instance Segmentation | Chufeng Tang, Hang Chen, Xiao Li, Jianmin Li, Zhaoxiang Zhang, Xiaolin Hu |
5562 | Energy-Based Learning for Scene Graph Generation | Mohammed Suhail, Abhay Mittal, Behjat Siddiquie, Chris Broaddus, Jayan Eledath, Gerard Medioni, Leonid Sigal |
2178 | Heterogeneous Grid Convolution for Adaptive, Efficient, and Controllable Computation | Ryuhei Hamaguchi, Yasutaka Furukawa, Masaki Onishi, Ken Sakurada |
4537 | DCNAS: Densely Connected Neural Architecture Search for Semantic Image Segmentation | Xiong Zhang, Hongmin Xu, Hong Mo, Jianchao Tan, Cheng Yang, Lei Wang, Wenqi Ren |
2546 | Weakly Supervised Instance Segmentation for Videos With Temporal Mask Consistency | Qing Liu, Vignesh Ramanathan, Dhruv Mahajan, Alan Yuille, Zhenheng Yang |
10766 | Few-Shot Segmentation Without Meta-Learning: A Good Transductive Inference Is All You Need? | Malik Boudiaf, Hoel Kervadec, Ziko Imtiaz Masud, Pablo Piantanida, Ismail Ben Ayed, Jose Dolz |
3520 | Conditional Bures Metric for Domain Adaptation | You-Wei Luo, Chuan-Xian Ren |
5175 | Relative Order Analysis and Optimization for Unsupervised Deep Metric Learning | Shichao Kan, Yigang Cen, Yang Li, Vladimir Mladenovic, Zhihai He |
1937 | MIST: Multiple Instance Self-Training Framework for Video Anomaly Detection | Jia-Chang Feng, Fa-Ting Hong, Wei-Shi Zheng |
4873 | Patch-VQ: ‘Patching Up’ the Video Quality Problem | Zhenqiang Ying, Maniratnam Mandal, Deepti Ghadiyaram, Alan Bovik |
10743 | Boosting Video Representation Learning With Multi-Faceted Integration | Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Xiao-Ping Zhang, Dong Wu, Tao Mei |
3294 | Delving Deep Into Many-to-Many Attention for Few-Shot Video Object Segmentation | Haoxin Chen, Hanjie Wu, Nanxuan Zhao, Sucheng Ren, Shengfeng He |
2690 | FAIEr: Fidelity and Adequacy Ensured Image Caption Evaluation | Sijin Wang, Ziwei Yao, Ruiping Wang, Zhongqin Wu, Xilin Chen |
2869 | Iterative Shrinking for Referring Expression Grounding Using Deep Reinforcement Learning | Mingjie Sun, Jimin Xiao, Eng Gee Lim |
265 | Repetitive Activity Counting by Sight and Sound | Yunhua Zhang, Ling Shao, Cees G. M. Snoek |
822 | Audio-Driven Emotional Video Portraits | Xinya Ji, Hang Zhou, Kaisiyuan Wang, Wayne Wu, Chen Change Loy, Xun Cao, Feng Xu |
4853 | Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation | Liwei Wang, Jing Huang, Yin Li, Kun Xu, Zhengyuan Yang, Dong Yu |
6533 | Hierarchical and Partially Observable Goal-Driven Policy Learning With Goals Relational Graph | Xin Ye, Yezhou Yang |
7420 | KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA | Kenneth Marino, Xinlei Chen, Devi Parikh, Abhinav Gupta, Marcus Rohrbach |
2468 | Focus on Local: Detecting Lane Marker From Bottom Up via Key Point | Zhan Qu, Huan Jin, Yang Zhou, Zhen Yang, Wei Zhang |
6162 | VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization | Seunghwan Choi, Sunghyun Park, Minsoo Lee, Jaegul Choo |
4969 | Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition | Stephen Hausler, Sourav Garg, Ming Xu, Michael Milford, Tobias Fischer |
911 | DeRF: Decomposed Radiance Fields | Daniel Rebain, Wei Jiang, Soroosh Yazdani, Ke Li, Kwang Moo Yi, Andrea Tagliasacchi |