MAIN CONFERENCE
All papers will be presented in the same manner. Each paper will have a five minute pre-recorded video and a PDF of the poster. An asynchronous text chat will be available for each paper. Attendees can view the papers and videos on demand at any time. Authors will also have individual Q&A sessions at the posted times below.
All posted times are EDT but the chart linked below has all time zones’ conversions. When the virtual site is up, you will be able to select which sessions you are interested in and it will populate your own schedule.
Presentation Schedule
-
All times are Eastern Daylight Time
Date: Tuesday, June 22, 2021 11:00 – 13:30
Paper Session Four:
Paper ID | Paper Title | Authors |
8890 | Line Segment Detection Using Transformers Without Edges | Yifan Xu, Weijian Xu, David Cheung, Zhuowen Tu |
1139 | Predator: Registration of 3D Point Clouds With Low Overlap | Shengyu Huang, Zan Gojcic, Mikhail Usvyatsov, Andreas Wieser, Konrad Schindler |
1824 | Point2Skeleton: Learning Skeletal Representations from Point Clouds | Cheng Lin, Changjian Li, Yuan Liu, Nenglun Chen, Yi-King Choi, Wenping Wang |
4945 | Neural Lumigraph Rendering | Petr Kellnhofer, Lars C. Jebe, Andrew Jones, Ryan Spicer, Kari Pulli, Gordon Wetzstein |
2419 | Rotation Coordinate Descent for Fast Globally Optimal Rotation Averaging | Alvaro Parra, Shin-Fang Chng, Tat-Jun Chin, Anders Eriksson, Ian Reid |
3659 | Towards Evaluating and Training Verifiably Robust Neural Networks | Zhaoyang Lyu, Minghao Guo, Tong Wu, Guodong Xu, Kehuan Zhang, Dahua Lin |
1929 | Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-Localization in Large Scenes From Body-Mounted Sensors | Vladimir Guzov, Aymen Mir, Torsten Sattler, Gerard Pons-Moll |
3358 | Discover Cross-Modality Nuances for Visible-Infrared Person Re-Identification | Qiong Wu, Pingyang Dai, Jie Chen, Chia-Wen Lin, Yongjian Wu, Feiyue Huang, Bineng Zhong, Rongrong Ji |
1427 | Dual Pixel Exploration: Simultaneous Depth Estimation and Image Restoration | Liyuan Pan, Shah Chowdhury, Richard Hartley, Miaomiao Liu, Hongguang Zhang, Hongdong Li |
3586 | Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets | Yuan-Hong Liao, Amlan Kar, Sanja Fidler |
1619 | ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis | Yinan He, Bei Gan, Siyu Chen, Yichun Zhou, Guojun Yin, Luchuan Song, Lu Sheng, Jing Shao, Ziwei Liu |
5085 | Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos | Jiawei Liu, Zheng-Jun Zha, Wei Wu, Kecheng Zheng, Qibin Sun |
5822 | SSN: Soft Shadow Network for Image Compositing | Yichen Sheng, Jianming Zhang, Bedrich Benes |
11788 | Soft-IntroVAE: Analyzing and Improving the Introspective Variational Autoencoder | Tal Daniel, Aviv Tamar |
4081 | Learning Placeholders for Open-Set Recognition | Da-Wei Zhou, Han-Jia Ye, De-Chuan Zhan |
1896 | ReNAS: Relativistic Evaluation of Neural Architecture Search | Yixing Xu, Yunhe Wang, Kai Han, Yehui Tang, Shangling Jui, Chunjing Xu, Chang Xu |
4117 | Learning To Filter: Siamese Relation Network for Robust Tracking | Siyuan Cheng, Bineng Zhong, Guorong Li, Xin Liu, Zhenjun Tang, Xianxian Li, Jing Wang |
2644 | Generative Hierarchical Features From Synthesizing Images | Yinghao Xu, Yujun Shen, Jiapeng Zhu, Ceyuan Yang, Bolei Zhou |
8086 | Continual Adaptation of Visual Representations via Domain Randomization and Meta-Learning | Riccardo Volpi, Diane Larlus, Grégory Rogez |
8866 | NewtonianVAE: Proportional Control and Goal Identification From Pixels via Physical Latent Spaces | Miguel Jaques, Michael Burke, Timothy M. Hospedales |
6073 | 3D-to-2D Distillation for Indoor Scene Parsing | Zhengzhe Liu, Xiaojuan Qi, Chi-Wing Fu |
10030 | Repurposing GANs for One-Shot Semantic Part Segmentation | Nontawat Tritrong, Pitchaporn Rewatbowornwong, Supasorn Suwajanakorn |
2958 | Temporal Query Networks for Fine-Grained Video Understanding | Chuhan Zhang, Ankush Gupta, Andrew Zisserman |
3887 | ManipulaTHOR: A Framework for Visual Object Manipulation | Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli VanderBilt, Luca Weihs, Eric Kolve, Aniruddha Kembhavi, Roozbeh Mottaghi |
344 | Omnimatte: Associating Objects and Their Effects in Video | Erika Lu, Forrester Cole, Tali Dekel, Andrew Zisserman, William T. Freeman, Michael Rubinstein |
582 | MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection | Vibashan VS, Vikram Gupta, Poojan Oza, Vishwanath A. Sindagi, Vishal M. Patel |
2564 | Generalized Few-Shot Object Detection Without Forgetting | Zhibo Fan, Yuchen Ma, Zeming Li, Jian Sun |
6392 | DAP: Detection-Aware Pre-Training With Weak Supervision | Yuanyi Zhong, Jianfeng Wang, Lijuan Wang, Jian Peng, Yu-Xiong Wang, Lei Zhang |
5538 | A Multiplexed Network for End-to-End, Multilingual OCR | Jing Huang, Guan Pang, Rama Kovvuri, Mandy Toh, Kevin J Liang, Praveen Krishnan, Xi Yin, Tal Hassner |
6276 | Scene Text Retrieval via Joint Text Detection and Similarity Learning | Hao Wang, Xiang Bai, Mingkun Yang, Shenggao Zhu, Jing Wang, Wenyu Liu |
1987 | Data-Uncertainty Guided Multi-Phase Learning for Semi-Supervised Object Detection | Zhenyu Wang, Yali Li, Ye Guo, Lu Fang, Shengjin Wang |
1291 | pixelNeRF: Neural Radiance Fields From One or Few Images | Alex Yu, Vickie Ye, Matthew Tancik, Angjoo Kanazawa |
4891 | From Points to Multi-Object 3D Reconstruction | Francis Engelmann, Konstantinos Rematas, Bastian Leibe, Vittorio Ferrari |
3949 | 4D Hyperspectral Photoacoustic Data Restoration With Reliability Analysis | Weihang Liao, Art Subpa-asa, Yinqiang Zheng, Imari Sato |
2306 | RfD-Net: Point Scene Understanding by Semantic Instance Reconstruction | Yinyu Nie, Ji Hou, Xiaoguang Han, Matthias Nießner |
4083 | Style-Based Point Generator With Adversarial Rendering for Point Cloud Completion | Chulin Xie, Chuxin Wang, Bo Zhang, Hao Yang, Dong Chen, Fang Wen |
7033 | Denoise and Contrast for Category Agnostic Shape Completion | Antonio Alliegro, Diego Valsesia, Giulia Fracastoro, Enrico Magli, Tatiana Tommasi |
8562 | Neural Surface Maps | Luca Morreale, Noam Aigerman, Vladimir G. Kim, Niloy J. Mitra |
3438 | RGB-D Local Implicit Function for Depth Completion of Transparent Objects | Luyang Zhu, Arsalan Mousavian, Yu Xiang, Hammad Mazhar, Jozef van Eenbergen, Shoubhik Debnath, Dieter Fox |
10157 | Uncertainty-Aware Camera Pose Estimation From Points and Lines | Alexander Vakhitov, Luis Ferraz, Antonio Agudo, Francesc Moreno-Noguer |
2491 | Patch2Pix: Epipolar-Guided Pixel-Level Correspondences | Qunjie Zhou, Torsten Sattler, Laura Leal-Taixé |
7332 | Deep Multi-Task Learning for Joint Localization, Perception, and Prediction | John Phillips, Julieta Martinez, Ioan Andrei Bârsan, Sergio Casas, Abbas Sadat, Raquel Urtasun |
4114 | IBRNet: Learning Multi-View Image-Based Rendering | Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul P. Srinivasan, Howard Zhou, Jonathan T. Barron, Ricardo Martin-Brualla, Noah Snavely, Thomas Funkhouser |
7304 | Unsupervised Learning of 3D Object Categories From Videos in the Wild | Philipp Henzler, Jeremy Reizenstein, Patrick Labatut, Roman Shapovalov, Tobias Ritschel, Andrea Vedaldi, David Novotny |
754 | LiDAR-Aug: A General Rendering-Based Augmentation Framework for 3D Object Detection | Jin Fang, Xinxin Zuo, Dingfu Zhou, Shengze Jin, Sen Wang, Liangjun Zhang |
3226 | Delving Into Localization Errors for Monocular 3D Object Detection | Xinzhu Ma, Yinmin Zhang, Dan Xu, Dongzhan Zhou, Shuai Yi, Haojie Li, Wanli Ouyang |
1652 | 3D CNNs With Adaptive Temporal Feature Resolutions | Mohsen Fayyaz, Emad Bahrami, Ali Diba, Mehdi Noroozi, Ehsan Adeli, Luc Van Gool, Jürgen Gall |
313 | 3D Human Action Representation Learning via Cross-View Consistency Pursuit | Linguo Li, Minsi Wang, Bingbing Ni, Hang Wang, Jiancheng Yang, Wenjun Zhang |
1723 | Three Birds with One Stone: Multi-Task Temporal Action Detection via Recycling Temporal Annotations | Zhihui Li, Lina Yao |
960 | Delving into Data: Effectively Substitute Training for Black-box Attack | Wenxuan Wang, Bangjie Yin, Taiping Yao, Li Zhang, Yanwei Fu, Shouhong Ding, Jilin Li, Feiyue Huang, Xiangyang Xue |
7686 | Data-Free Model Extraction | Jean-Baptiste Truong, Pratyush Maini, Robert J. Walls, Nicolas Papernot |
7269 | Adaptive Weighted Discriminator for Training Generative Adversarial Networks | Vasily Zadorozhnyy, Qiang Cheng, Qiang Ye |
1898 | Monocular Reconstruction of Neural Face Reflectance Fields | Mallikarjun B R, Ayush Tewari, Tae-Hyun Oh, Tim Weyrich, Bernd Bickel, Hans-Peter Seidel, Hanspeter Pfister, Wojciech Matusik, Mohamed Elgharib, Christian Theobalt |
2267 | Towards Accurate 3D Human Motion Prediction From Incomplete Observations | Qiongjie Cui, Huaijiang Sun |
1429 | Monocular Real-Time Full Body Capture With Inter-Part Correlations | Yuxiao Zhou, Marc Habermann, Ikhsanul Habibie, Ayush Tewari, Christian Theobalt, Feng Xu |
6178 | Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting | Lingbo Liu, Jiaqi Chen, Hefeng Wu, Guanbin Li, Chenglong Li, Liang Lin |
2827 | One Shot Face Swapping on Megapixels | Yuhao Zhu, Qi Li, Jian Wang, Cheng-Zhong Xu, Zhenan Sun |
2974 | Dynamic Probabilistic Graph Convolution for Facial Action Unit Intensity Estimation | Tengfei Song, Zijun Cui, Yuru Wang, Wenming Zheng, Qiang Ji |
815 | Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification | Fengxiang Yang, Zhun Zhong, Zhiming Luo, Yuanzheng Cai, Yaojin Lin, Shaozi Li, Nicu Sebe |
10905 | Prototype-Guided Saliency Feature Learning for Person Search | Hanjae Kim, Sunghun Joung, Ig-Jae Kim, Kwanghoon Sohn |
3944 | Labeled From Unlabeled: Exploiting Unlabeled Data for Few-Shot Deep HDR Deghosting | K. Ram Prabhakar, Gowtham Senthil, Susmit Agrawal, R. Venkatesh Babu, Rama Krishna Sai S Gorthi |
2053 | Learning Spatially-Variant MAP Models for Non-Blind Image Deblurring | Jiangxin Dong, Stefan Roth, Bernt Schiele |
7780 | NBNet: Noise Basis Learning for Image Denoising With Subspace Projection | Shen Cheng, Yuzhi Wang, Haibin Huang, Donghao Liu, Haoqiang Fan, Shuaicheng Liu |
6177 | Image De-Raining via Continual Learning | Man Zhou, Jie Xiao, Yifan Chang, Xueyang Fu, Aiping Liu, Jinshan Pan, Zheng-Jun Zha |
6156 | Exploring Sparsity in Image Super-Resolution for Efficient Inference | Longguang Wang, Xiaoyu Dong, Yingqian Wang, Xinyi Ying, Zaiping Lin, Wei An, Yulan Guo |
4280 | From Shadow Generation To Shadow Removal | Zhihao Liu, Hui Yin, Xinyi Wu, Zhenyao Wu, Yang Mi, Song Wang |
1962 | Spatiotemporal Registration for Event-Based Visual Odometry | Daqi Liu, Alvaro Parra, Tat-Jun Chin |
302 | BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond | Kelvin C.K. Chan, Xintao Wang, Ke Yu, Chao Dong, Chen Change Loy |
3343 | Fast Bayesian Uncertainty Estimation and Reduction of Batch Normalized Single Image Super-Resolution Network | Aupendu Kar, Prabir Kumar Biswas |
2552 | Learning Temporal Consistency for Low Light Video Enhancement From Single Images | Fan Zhang, Yu Li, Shaodi You, Ying Fu |
2107 | Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges | Qingyong Hu, Bo Yang, Sheikh Khalid, Wen Xiao, Niki Trigoni, Andrew Markham |
8839 | Neural Side-by-Side: Predicting Human Preferences for No-Reference Super-Resolution Evaluation | Valentin Khrulkov, Artem Babenko |
5369 | Slimmable Compressive Autoencoders for Practical Neural Image Compression | Fei Yang, Luis Herranz, Yongmei Cheng, Mikhail G. Mozerov |
1390 | Distilling Knowledge via Knowledge Review | Pengguang Chen, Shu Liu, Hengshuang Zhao, Jiaya Jia |
1043 | Manifold Regularized Dynamic Network Pruning | Yehui Tang, Yunhe Wang, Yixing Xu, Yiping Deng, Chao Xu, Dacheng Tao, Chang Xu |
2224 | Learnable Companding Quantization for Accurate Low-Bit Neural Networks | Kohei Yamamoto |
10100 | Lips Don’t Lie: A Generalisable and Robust Approach To Face Forgery Detection | Alexandros Haliassos, Konstantinos Vougioukas, Stavros Petridis, Maja Pantic |
10338 | Guided Integrated Gradients: An Adaptive Path Method for Removing Noise | Andrei Kapishnikov, Subhashini Venugopalan, Besim Avci, Ben Wedin, Michael Terry, Tolga Bolukbasi |
3934 | Scalable Differential Privacy With Sparse Network Finetuning | Zelun Luo, Daniel J. Wu, Ehsan Adeli, Li Fei-Fei |
2115 | Deep Graph Matching Under Quadratic Constraint | Quankai Gao, Fudong Wang, Nan Xue, Jin-Gang Yu, Gui-Song Xia |
8225 | T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval | Xiaohan Wang, Linchao Zhu, Yi Yang |
5722 | FaceInpainter: High Fidelity Face Adaptation to Heterogeneous Domains | Jia Li, Zhaoyang Li, Jie Cao, Xingguang Song, Ran He |
5597 | Partition-Guided GANs | Mohammadreza Armandpour, Ali Sadeghian, Chunyuan Li, Mingyuan Zhou |
1451 | Repopulating Street Scenes | Yifan Wang, Andrew Liu, Richard Tucker, Jiajun Wu, Brian L. Curless, Steven M. Seitz, Noah Snavely |
2855 | Image Inpainting With External-Internal Learning and Monochromic Bottleneck | Tengfei Wang, Hao Ouyang, Qifeng Chen |
2878 | DG-Font: Deformable Generative Networks for Unsupervised Font Generation | Yangchen Xie, Xinyuan Chen, Li Sun, Yue Lu |
7839 | Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer | Tianwei Lin, Zhuoqi Ma, Fu Li, Dongliang He, Xin Li, Errui Ding, Nannan Wang, Jie Li, Xinbo Gao |
5214 | StylePeople: A Generative Model of Fullbody Human Avatars | Artur Grigorev, Karim Iskakov, Anastasia Ianina, Renat Bashirov, Ilya Zakharkin, Alexander Vakhitov, Victor Lempitsky |
11062 | Synthesize-It-Classifier: Learning a Generative Classifier Through Recurrent Self-Analysis | Arghya Pal, Raphaël C.-W. Phan, KokSheik Wong |
244 | Understanding Object Dynamics for Interactive Image-to-Video Synthesis | Andreas Blattmann, Timo Milbich, Michael Dorkenwald, Björn Ommer |
5839 | Learning Dynamic Alignment via Meta-Filter for Few-Shot Learning | Chengming Xu, Yanwei Fu, Chen Liu, Chengjie Wang, Jilin Li, Feiyue Huang, Li Zhang, Xiangyang Xue |
2175 | Jo-SRC: A Contrastive Approach for Combating Noisy Labels | Yazhou Yao, Zeren Sun, Chuanyi Zhang, Fumin Shen, Qi Wu, Jian Zhang, Zhenmin Tang |
4599 | On Focal Loss for Class-Posterior Probability Estimation: A Theoretical Perspective | Nontawat Charoenphakdee, Jayakorn Vongkulbhisal, Nuttapong Chairatanakul, Masashi Sugiyama |
6434 | MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition | Shuang Li, Kaixiong Gong, Chi Harold Liu, Yulin Wang, Feng Qiao, Xinjing Cheng |
7256 | Open World Compositional Zero-Shot Learning | Massimiliano Mancini, Muhammad Ferjad Naeem, Yongqin Xian, Zeynep Akata |
6107 | Deep Texture Recognition via Exploiting Cross-Layer Statistical Self-Similarity | Zhile Chen, Feng Li, Yuhui Quan, Yong Xu, Hui Ji |
3260 | Combinatorial Learning of Graph Edit Distance via Dynamic Embedding | Runzhong Wang, Tianqi Zhang, Tianshu Yu, Junchi Yan, Xiaokang Yang |
7448 | TransNAS-Bench-101: Improving Transferability and Generalizability of Cross-Task Neural Architecture Search | Yawen Duan, Xin Chen, Hang Xu, Zewei Chen, Xiaodan Liang, Tong Zhang, Zhenguo Li |
10123 | An Alternative Probabilistic Interpretation of the Huber Loss | Gregory P. Meyer |
7795 | Joint Deep Model-Based MR Image and Coil Sensitivity Reconstruction Network (Joint-ICNet) for Fast MRI | Yohan Jun, Hyungseob Shin, Taejoon Eo, Dosik Hwang |
6230 | Automatic Vertebra Localization and Identification in CT by Spine Rectification and Anatomically-Constrained Optimization | Fakai Wang, Kang Zheng, Le Lu, Jing Xiao, Min Wu, Shun Miao |
3472 | Alpha-Refine: Boosting Tracking Performance by Precise Bounding Box Estimation | Bin Yan, Xinyu Zhang, Dong Wang, Huchuan Lu, Xiaoyun Yang |
1576 | Learnable Graph Matching: Incorporating Graph Partitioning With Deep Feature Learning for Multiple Object Tracking | Jiawei He, Zehao Huang, Naiyan Wang, Zhaoxiang Zhang |
1821 | Group-aware Label Transfer for Domain Adaptive Person Re-identification | Kecheng Zheng, Wu Liu, Lingxiao He, Tao Mei, Jiebo Luo, Zheng-Jun Zha |
8382 | Double Low-Rank Representation With Projection Distance Penalty for Clustering | Zhiqiang Fu, Yao Zhao, Dongxia Chang, Xingxing Zhang, Yiming Wang |
453 | Multiple Instance Active Learning for Object Detection | Tianning Yuan, Fang Wan, Mengying Fu, Jianzhuang Liu, Songcen Xu, Xiangyang Ji, Qixiang Ye |
2199 | Learning Compositional Representation for 4D Captures With Neural ODE | Boyan Jiang, Yinda Zhang, Xingkui Wei, Xiangyang Xue, Yanwei Fu |
2879 | Curriculum Graph Co-Teaching for Multi-Target Domain Adaptation | Subhankar Roy, Evgeny Krivosheev, Zhun Zhong, Nicu Sebe, Elisa Ricci |
4872 | Instance Level Affinity-Based Transfer for Unsupervised Domain Adaptation | Astuti Sharma, Tarun Kalluri, Manmohan Chandraker |
3798 | Deep Stable Learning for Out-of-Distribution Generalization | Xingxuan Zhang, Peng Cui, Renzhe Xu, Linjun Zhou, Yue He, Zheyan Shen |
3143 | ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for Semi-Supervised Continual Learning | Liyuan Wang, Kuo Yang, Chongxuan Li, Lanqing Hong, Zhenguo Li, Jun Zhu |
2689 | Dynamic Metric Learning: Towards a Scalable Metric Space To Accommodate Multiple Semantic Scales | Yifan Sun, Yuke Zhu, Yuhan Zhang, Pengkun Zheng, Xi Qiu, Chi Zhang, Yichen Wei |
8722 | Learning Cross-Modal Retrieval With Noisy Labels | Peng Hu, Xi Peng, Hongyuan Zhu, Liangli Zhen, Jie Lin |
7087 | How Well Do Self-Supervised Models Transfer? | Linus Ericsson, Henry Gouk, Timothy M. Hospedales |
2643 | Generic Perceptual Loss for Modeling Structured Output Dependencies | Yifan Liu, Hao Chen, Yu Chen, Wei Yin, Chunhua Shen |
7564 | EDNet: Efficient Disparity Estimation With Cost Volume Combination and Attention-Based Spatial Residual | Songyan Zhang, Zhicheng Wang, Qiang Wang, Jinshuo Zhang, Gang Wei, Xiaowen Chu |
1909 | BoxInst: High-Performance Instance Segmentation With Box Annotations | Zhi Tian, Chunhua Shen, Xinlong Wang, Hao Chen |
2060 | PhySG: Inverse Rendering With Spherical Gaussians for Physics-Based Material Editing and Relighting | Kai Zhang, Fujun Luan, Qianqian Wang, Kavita Bala, Noah Snavely |
458 | MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers | Huiyu Wang, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen |
3092 | Scale-Aware Graph Neural Network for Few-Shot Semantic Segmentation | Guo-Sen Xie, Jie Liu, Huan Xiong, Ling Shao |
8742 | Part-Aware Panoptic Segmentation | Daan de Geus, Panagiotis Meletis, Chenyang Lu, Xiaoxiao Wen, Gijs Dubbelman |
4107 | Railroad Is Not a Train: Saliency As Pseudo-Pixel Supervision for Weakly Supervised Semantic Segmentation | Seungho Lee, Minhyun Lee, Jongwuk Lee, Hyunjung Shim |
6035 | Mask-Embedded Discriminator With Region-Based Semantic Regularization for Semi-Supervised Class-Conditional Image Synthesis | Yi Liu, Xiaoyang Huo, Tianyi Chen, Xiangping Zeng, Si Wu, Zhiwen Yu, Hau-San Wong |
1953 | Unsupervised Hyperbolic Representation Learning via Message Passing Auto-Encoders | Jiwoong Park, Junho Cho, Hyung Jin Chang, Jin Young Choi |
3346 | 4D Panoptic LiDAR Segmentation | Mehmet Aygün, Aljoša Ošep, Mark Weber, Maxim Maximov, Cyrill Stachniss, Jens Behley, Laura Leal-Taixé |
3147 | EffiScene: Efficient Per-Pixel Rigidity Inference for Unsupervised Joint Learning of Optical Flow, Depth, Camera Pose and Motion Segmentation | Yang Jiao, Trac D. Tran, Guangming Shi |
2976 | Learning by Aligning Videos in Time | Sanjay Haresh, Sateesh Kumar, Huseyin Coskun, Shahram N. Syed, Andrey Konin, Zeeshan Zia, Quoc-Huy Tran |
1786 | Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion | Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang |
5785 | Polygonal Point Set Tracking | Gunhee Nam, Miran Heo, Seoung Wug Oh, Joon-Young Lee, Seon Joo Kim |
7322 | VinVL: Revisiting Visual Representations in Vision-Language Models | Pengchuan Zhang, Xiujun Li, Xiaowei Hu, Jianwei Yang, Lei Zhang, Lijuan Wang, Yejin Choi, Jianfeng Gao |
3268 | Visual Semantic Role Labeling for Video Understanding | Arka Sadhu, Tanmay Gupta, Mark Yatskar, Ram Nevatia, Aniruddha Kembhavi |
2308 | Can Audio-Visual Integration Strengthen Robustness Under Multimodal Attacks? | Yapeng Tian, Chenliang Xu |
1835 | Relation-aware Instance Refinement for Weakly Supervised Visual Grounding | Yongfei Liu, Bo Wan, Lin Ma, Xuming He |
10687 | Learning Better Visual Dialog Agents With Pretrained Visual-Linguistic Representation | Tao Tu, Qing Ping, Govindarajan Thattai, Gokhan Tur, Prem Natarajan |
1154 | Separating Skills and Concepts for Novel Visual Question Answering | Spencer Whitehead, Hui Wu, Heng Ji, Rogerio Feris, Kate Saenko |
2281 | Generating Manga From Illustrations via Mimicking Manga Creation Workflow | Lvmin Zhang, Xinrui Wang, Qingnan Fan, Yi Ji, Chunping Liu |
3476 | SelfDoc: Self-Supervised Document Representation Learning | Peizhao Li, Jiuxiang Gu, Jason Kuen, Vlad I. Morariu, Handong Zhao, Rajiv Jain, Varun Manjunatha, Hongfu Liu |
11090 | Affect2MM: Affective Analysis of Multimedia Content Using Emotion Causality | Trisha Mittal, Puneet Mathur, Aniket Bera, Dinesh Manocha |
2542 | Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting | Ayan Kumar Bhunia, Pinaki Nath Chowdhury, Yongxin Yang, Timothy M. Hospedales, Tao Xiang, Yi-Zhe Song |