Py学习  »  aigc

CV&AIGC顶会整理 [2024-10-22]

晓飞的算法工程笔记 • 8 月前 • 216 次点击  

今日更新48篇:

  • 计算机视觉会议 22篇
  • 自然语言处理会议 26篇
请注意,大模型的论文多发布于自然语言处理会议中。而由于多模态的发展迅速,部分计算机视觉相关的论文也会发布在自然语言处理顶会中。

计算机视觉会议: 22篇


[0] Optimizing Parking Space Classification: Distilling Ensembles into Lightweight Classifiers[cs.CV]
标题:优化停车位分类:将从集成模型提取到轻量级分类器
作者:Paulo Luza Alves, André Hochuli, Luiz Eduardo de Oliveira, Paulo Lisboa de Almeida
链接:http://arxiv.org/abs/2410.14705
备注:Accepted for presentation at the International Conference on Machine Learning and Applications (ICMLA) 2024

[1] Non-Invasive to Invasive: Enhancing FFA Synthesis from CFP with a Benchmark Dataset and a Novel Network[cs.CV]
标题:非侵入式到侵入式:通过基准数据集和新型网络增强自CFP的FFA合成
作者:Hongqiu Wang, Zhaohu Xing, Weitong Wu, Yijun Yang, Qingqing Tang, Meixia Zhang, Yanwu Xu, Lei Zhu
链接:http://arxiv.org/abs/2410.14965
代码:https://github.com/whq-xxh/FFA-Synthesis
备注:ACMMM 24 MCHM

[2] DCDepth: Progressive Monocular Depth Estimation in Discrete Cosine Domain[cs.CV]
标题:离散余弦域中的DCDepth:渐进式单目深度估计
作者:Kun Wang, Zhiqiang Yan, Junkai Fan, Wanlu Zhu, Xiang Li, Jun Li, Jian Yang
链接:http://arxiv.org/abs/2410.14980
代码:https://github.com/w2kun/DCDepth
备注:Accepted by NeurIPS-2024

[3] Quanta Video Restoration[cs.CV]
标题:量子视频恢复
作者:Prateek Chennuri, Yiheng Chi, Enze Jiang, G. M. Dilshan Godaliyadda, Abhiram Gnanasambandam, Hamid R. Sheikh, Istvan Gyongy, Stanley H. Chan
链接:http://arxiv.org/abs/2410.14994
代码:https://github.com/chennuriprateek/Quanta_Video_Restoration-QUIVER-
期刊:European Conference on Computer Vision (ECCV) 2024

[4] How Many Van Goghs Does It Take to Van Gogh? Finding the Imitation Threshold[cs.CV]
标题:需要多少梵·高才能成为梵·高?寻找模仿阈值
作者:Sahil Verma, Royi Rassin, Arnav Das, Gantavya Bhatt, Preethi Seshadri, Chirag Shah, Jeff Bilmes, Hannaneh Hajishirzi, Yanai Elazar
链接:http://arxiv.org/abs/2410.15002
代码:https://github.com/vsahil/MIMETIC-2.git
备注:Accepted at ATTRIB, RegML, and SafeGenAI workshops at NeurIPS 2024 and NLLP Workshop 2024

[5] DiffuseST: Unleashing the Capability of the Diffusion Model for Style Transfer[cs.CV]
标题:扩散模型在风格迁移中的潜在能力释放:DiffuseST
作者:Ying Hu, Chenyi Zhuang, Pan Gao
链接:http://arxiv.org/abs/2410.15007
代码:https://github.com/I2-Multimedia-Lab/DiffuseST
备注:Accepted to ACMMM Asia 2024. Code is available at this https URL

[6] Scene Graph Generation with Role-Playing Large Language Models[cs.CV]
标题:场景图生成与角色扮演大型语言模型
作者:Guikun Chen, Jin Li, Wenguan Wang
链接:http://arxiv.org/abs/2410.15364
代码:https://github.com/guikunchen/SDSGG
备注:NeurIPS 2024. Code: this https URL

[7] IPO: Interpretable Prompt Optimization for Vision-Language Models[cs.CV]
标题:视觉-语言模型的可解释提示优化
作者:Yingjun Du, Wenfang Sun, Cees G. M. Snoek
链接:http://arxiv.org/abs/2410.15397
备注:Accepted by NeurIPS 2024

[8] BoostAdapter: Improving Test-Time Adaptation via Regional Bootstrapping[cs.CV]
标题:区域引导:通过区域自举提升测试时适应性
作者:Taolin Zhang, Jinpeng Wang, Hang Guo, Tao Dai, Bin Chen, Shu-Tao Xia
链接:http://arxiv.org/abs/2410.15430
备注:NeurIPS 2024

[9] Generalized Multimodal Fusion via Poisson-Nernst-Planck Equation[cs.CV]
标题:通用多模态融合通过泊松-内斯特-普朗克方程
作者:Jiayu Xiong, Jing Wang, Hengjing Xiang, Jun Xue, Chen Xu, Zhouqiang Jiang
链接:http://arxiv.org/abs/2410.15475
备注:NeurIPS 2024 Rejected paper, 28 pages

[10] ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from Videos[cs.CV]
标题:ARTS:利用解耦骨骼表示的半解析回归器进行视频中的人类网格恢复
作者:Tao Tang, Hong Liu, Yingxuan You, Ti Wang, Wenhao Li
链接:http://arxiv.org/abs/2410.15582
代码:https://github.com/TangTao-PKU/ARTS
备注:Accepted by ACM MM 2024. Project page: this https URL

[11] Fully Explicit Dynamic Gaussian Splatting[cs.CV]
标题:完全解析动态高斯粒状映射
作者:Junoh Lee, Chang-Yeon Won, Hyunjun Jung, Inhwan Bae, Hae-Gon Jeon
链接:http://arxiv.org/abs/2410.15629
备注:Accepted at NeurIPS 2024

[12] TALoS: Enhancing Semantic Scene Completion via Test-time Adaptation on the Line of Sight[cs.CV]
标题:TALoS:基于视线的测试时间自适应增强语义场景补全
作者:Hyun-Kurl Jang, Jihun Kim, Hyeokjun Kweon, Kuk-Jin Yoon
链接:http://arxiv.org/abs/2410.15674
代码:https://github.com/blue-531/TALoS
备注:Accepted at NeurIPS 2024. Code is available at this https URL

[13] LiMTR: Time Series Motion Prediction for Diverse Road Users through Multimodal Feature Integration[cs.CV]
标题:多模态特征集成用于多样化道路使用者的时间序列运动预测:LiMTR
作者:Camiel Oerlemans, Bram Grooten, Michiel Braat, Alaa Alassi, Emilia Silvas, Decebal Constantin Mocanu
链接:http://arxiv.org/abs/2410.15819
代码:https://github.com/Cing2/LiMTR
备注:Accepted at the NeurIPS 2024 workshop Time Series in the Age of Large Models. Code available at this https URL

[14] Random Token Fusion for Multi-View Medical Diagnosis[cs.CV]
标题:随机 tokens 融合用于多视图医学诊断
作者:Jingyu Guo, Christos Matsoukas, Fredrik Strand, Kevin Smith
链接:http://arxiv.org/abs/2410.15847
备注:Originally published at the NeurIPS 2024 Workshop on Advancements In Medical Foundation Models: Explainability, Robustness, Security, and Beyond (AIM-FM)

[15] Visual Motif Identification: Elaboration of a Curated Comparative Dataset and Classification Methods[cs.CV]
标题:视觉模式识别:构建一个精选比较数据集和分类方法的阐述
作者:Adam Phillips (1), Daniel Grandes Rodriguez (1), Miriam Sánchez-Manzano (1), Alan Salvadó (1), Manuel Garin (1), Gloria Haro (1), Coloma Ballester (1) ((1) Universitat Pompeu Fabra, Barcelona, Spain)
链接:http://arxiv.org/abs/2410.15866
备注:17 pages, 11 figures, one table, to be published in the conference proceedings of ECCV 2024

[16] Mitigating Object Hallucination via Concentric Causal Attention[cs.CV]
标题:通过同心因果注意力缓解物体幻觉
作者:Yun Xing, Yiheng Li, Ivan Laptev, Shijian Lu
链接:http://arxiv.org/abs/2410.15926
代码:https://github.com/xing0047/cca-llava
备注:To appear at NeurIPS 2024. Code is available at this https URL

[17] Zero-Shot Scene Reconstruction from Single Images with Deep Prior Assembly[cs.CV]
标题:使用深度先验组装的单图像零样本场景重建
作者:Junsheng Zhou, Yu-Shen Liu, Zhizhong Han
链接:http://arxiv.org/abs/2410.15971
备注:To appear at NeurIPS 2024. Project page: this https URL

[18] START: A Generalized State Space Model with Saliency-Driven Token-Aware Transformation[cs.CV]
标题:START:一种基于显著性驱动的全局Token感知转换的一般化状态空间模型
作者:Jintao Guo, Lei Qi, Yinghuan Shi, Yang Gao
链接:http://arxiv.org/abs/2410.16020
代码:https://github.com/lingeringlight/START
备注:Accepted by NeurIPS2024. The code is available at this https URL

[19] Towards Combating Frequency Simplicity-biased Learning for Domain Generalization[cs.CV]
标题:向着对抗频率简单偏差学习的领域泛化方法
作者:Xilin He, Jingyu Hu, Qinliang Lin, Cheng Luo, Weicheng Xie, Siyang Song, Muhammad Haris Khan, Linlin Shen
链接:http://arxiv.org/abs/2410.16146
代码:https://github.com/C0notSilly/AdvFrequency
备注:Accepted by NeurIPS 2024

[20] Warped Diffusion: Solving Video Inverse Problems with Image Diffusion Models[cs.CV]
标题:扭曲扩散:使用图像扩散模型解决视频逆问题
作者:Giannis Daras, Weili Nie, Karsten Kreis, Alex Dimakis, Morteza Mardani, Nikola Borislavov Kovachki, Arash Vahdat
链接:http://arxiv.org/abs/2410.16152
备注:Accepted in NeurIPS 2024

[21] 3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors[cs.CV]
标题:3DGS-增强器:通过视图一致二维扩散先验增强无限3D高斯喷溅
作者:Xi Liu, Chaoyi Zhou, Siyu Huang
链接:http://arxiv.org/abs/2410.16266
代码:https://xiliu8006.github.io/3DGS-Enhancer-project
备注:Accepted by NeurIPS 2024 Spotlight

自然语言处理会议: 26篇


[0] QuAILoRA: Quantization-Aware Initialization for LoRA[cs.CL]
标题:QuAILoRA:LoRA 中的量化感知初始化
作者:Neal Lawton, Aishwarya Padmakumar, Judith Gaspers, Jack FitzGerald, Anoop Kumar, Greg Ver Steeg, Aram Galstyan
链接:http://arxiv.org/abs/2410.14713
备注:12 pages, 7 figures. Submitted to the 4th NeurIPS Workshop on Efficient Natural Language and Speech Processing (ENLSP-IV)

[1] Rethinking Token Reduction for State Space Models[cs.CL]
标题:重新思考状态空间模型中的标记缩减
作者:Zheng Zhan, Yushu Wu, Zhenglun Kong, Changdi Yang, Yifan Gong, Xuan Shen, Xue Lin, Pu Zhao, Yanzhi Wang
链接:http://arxiv.org/abs/2410.14725
备注:EMNLP 2024

[2] TimeSeriesExam: A time series understanding exam[cs.CL]
标题:时间序列考察:时间序列理解考试
作者:Yifu Cai, Arjun Choudhry, Mononito Goswami, Artur Dubrawski
链接:http://arxiv.org/abs/2410.14752
备注:Accepted at NeurIPS'24 Time Series in the Age of Large Models Workshop

[3] Which LLMs are Difficult to Detect? A Detailed Analysis of Potential Factors Contributing to Difficulties in LLM Text Detection[cs.CL]
标题:哪些LLM难以检测?对导致LLM文本检测困难的潜在因素的详细分析
作者:Shantanu Thorat, Tianbao Yang
链接:http://arxiv.org/abs/2410.14875
备注:Accepted at NeurIPS 2024 - Safe Generative AI Workshop

[4] Class-RAG: Content Moderation with Retrieval Augmented Generation[cs.CL]
标题:内容增强生成用于内容审核的Class-RAG
作者:Jianfa Chen, Emily Shen, Trupti Bavalatti, Xiaowen Lin, Yongkai Wang, Shuming Hu, Harihar Subramanyam, Ksheeraj Sai Vepuri, Ming Jiang, Ji Qi, Li Chen, Nan Jiang, Ankit Jain
链接:http://arxiv.org/abs/2410.14881
备注:11 pages, submit to ACL

[5] From Test-Taking to Test-Making: Examining LLM Authoring of Commonsense Assessment Items[cs.CL]
标题:从试题作答到试题编制:探讨大型语言模型在常识评估项目创作中的表现
作者:Melissa Roemmele, Andrew S. Gordon
链接:http://arxiv.org/abs/2410.14897
备注:Accepted at Findings of EMNLP 2024

[6] A Survey of Ontology Expansion for Conversational Understanding[cs.CL]
标题:对话理解中的本体扩展综述
作者:Jinggui Liang, Yuxia Wu, Yuan Fang, Hao Fei, Lizi Liao
链接:http://arxiv.org/abs/2410.15019
代码:https://github.com/liangjinggui/Ontology-Expansion
备注:Accepted by EMNLP 2024, code and data are available at this https URL: this https URL

[7] Are LLMs Good Zero-Shot Fallacy Classifiers?[cs.CL]
标题:大型语言模型是好的零样本谬误分类器吗?
作者:Fengjun Pan, Xiaobao Wu, Zongrui Li, Anh Tuan Luu
链接:http://arxiv.org/abs/2410.15050
代码:https://github.com/panFJCharlotte98/Fallacy_Detection
备注:Accepted to EMNLP2024 main conference

[8] Toward Robust RALMs: Revealing the Impact of Imperfect Retrieval on Retrieval-Augmented Language Models[cs.CL]
标题:向更鲁棒的RA-LMs迈进:揭示不完美检索对检索增强语言模型的影响
作者:Seong-Il Park, Jay-Yoon Lee
链接:http://arxiv.org/abs/2410.15107
备注:Accepted for publication in Transactions of the Association for Computational Linguistics (TACL)

[9] MELT: Materials-aware Continued Pre-training for Language Model Adaptation to Materials Science[cs.CL]
标题:材料感知的语言模型自适应材料科学的学习连续预训练:MELT
作者:Junho Kim, Yeachan Kim, Jun-Hyung Park, Yerim Oh, Suho Kim, SangKeun Lee
链接:http://arxiv.org/abs/2410.15126
备注:Accepted at EMNLP 2024 (Findings)

[10] Less is More: Parameter-Efficient Selection of Intermediate Tasks for Transfer Learning[cs.CL]
标题:更少即是更多:迁移学习中对中间任务的参数高效选择
作者:David Schulte, Felix Hamborg, Alan Akbik
链接:http://arxiv.org/abs/2410.15148
备注:EMNLP 2024 Main Conference

[11] Explaining Graph Neural Networks with Large Language Models: A Counterfactual Perspective for Molecular Property Prediction[cs.CL]
标题:用大型语言模型解释图神经网络:分子性质预测的对比视角
作者:Yinhan He, Zaiyi Zheng, Patrick Soga, Yaozhen Zhu, yushun Dong, Jundong Li
链接:http://arxiv.org/abs/2410.15165
代码:https://github.com/YinhanHe123/new
期刊:EMNLP 2024 (Findings)

[12] An Electoral Approach to Diversify LLM-based Multi-Agent Collective Decision-Making[cs.CL]
标题:基于多智能体集体决策的LLM分化选举方法
作者:Xiutian Zhao, Ke Wang, Wei Peng
链接:http://arxiv.org/abs/2410.15168
备注:Accepted to EMNLP 2024

[13] IPO: Interpretable Prompt Optimization for Vision-Language Models[cs.CV]
标题:视觉-语言模型的可解释提示优化
作者:Yingjun Du, Wenfang Sun, Cees G. M. Snoek
链接:http://arxiv.org/abs/2410.15397
备注:Accepted by NeurIPS 2024

[14] Evaluating Consistencies in LLM responses through a Semantic Clustering of Question Answering[cs.CL]
标题:评估基于语义聚类的LLM应答的一致性
作者:Yanggyu Lee, Jihie Kim
链接:http://arxiv.org/abs/2410.15440
备注:Accepted to the Trustworthy AI Workshop at IJCAI 2024

[15] "What is the value of {templates}?" Rethinking Document Information Extraction Datasets for LLMs[cs.CL]
标题:{模板}的价值何在?再谈为大型语言模型重思文档信息提取数据集
作者:Ran Zmigrod, Pranav Shetty, Mathieu Sibue, Zhiqiang Ma, Armineh Nourbakhsh, Xiaomo Liu, Manuela Veloso
链接:http://arxiv.org/abs/2410.15484
备注:Accepted to EMNLP Findings 2024

[16] Pruning Foundation Models for High Accuracy without Retraining[cs.CL]
标题:剪枝基础模型以达到不重训练的高精度
作者:Pu Zhao, Fei Sun, Xuan Shen, Pinrui Yu, Zhenglun Kong, Yanzhi Wang, Xue Lin
链接:http://arxiv.org/abs/2410.15567
代码:https://github.com/piuzha/APT
备注:Accepted by EMNLP 2024 findings

[17] Scalable Data Ablation Approximations for Language Models through Modular Training and Merging[cs.CL]
标题:可扩展的数据消融近似:通过模块化训练与合并的语言模型
作者:Clara Na, Ian Magnusson, Ananya Harsh Jha, Tom Sherborne, Emma Strubell, Jesse Dodge, Pradeep Dasigi
链接:http://arxiv.org/abs/2410.15661
备注:EMNLP 2024. 17 pages

[18] Mitigating Hallucinations of Large Language Models in Medical Information Extraction via Contrastive Decoding[cs.CL]
标题:通过对比解码缓解大型语言模型在医学信息提取中的幻觉问题
作者:Derong Xu, Ziheng Zhang, Zhihong Zhu, Zhenxi Lin, Qidong Liu, Xian Wu, Tong Xu, Xiangyu Zhao, Yefeng Zheng, Enhong Chen
链接:http://arxiv.org/abs/2410.15702
备注:Accepted by EMNLP 2024 Findings

[19] Who's Who: Large Language Models Meet Knowledge Conflicts in Practice[cs.CL]
标题:谁是首席:大型语言模型在实践中遭遇知识冲突
作者:Quang Hieu Pham, Hoang Ngo, Anh Tuan Luu, Dat Quoc Nguyen
链接:http://arxiv.org/abs/2410.15737
备注:Accepted to EMNLP 2024 Findings

[20] Toeing the Party Line: Election Manifestos as a Key to Understand Political Discourse on Twitter[cs.CL]
标题:响应党的路线:选举宣言理解Twitter上政治话语的关键
作者:Maximilian Maurer, Tanise Ceron, Sebastian Padó, Gabriella Lapesa
链接:http://arxiv.org/abs/2410.15743
备注:9 pages, accepted at EMNLP (Findings) 2024

[21] Improve Dense Passage Retrieval with Entailment Tuning[cs.CL]
标题:提升基于语义蕴含的密集文本检索能力
作者:Lu Dai, Hao Liu, Hui Xiong
链接:http://arxiv.org/abs/2410.15801
备注:EMNLP 2024 Main

[22] Mitigating Object Hallucination via Concentric Causal Attention[cs.CV]
标题:通过同心因果注意力缓解物体幻觉
作者:Yun Xing, Yiheng Li, Ivan Laptev, Shijian Lu
链接:http://arxiv.org/abs/2410.15926
代码:https://github.com/xing0047/cca-llava
备注:To appear at NeurIPS 2024. Code is available at this https URL

[23] Large Language Models Know What To Say But Not When To Speak[cs.CL]
标题:大型语言模型知道说什么,但不知道何时开口说
作者:Muhammad Umair, Vasanth Sarathy, JP de Ruiter
链接:http://arxiv.org/abs/2410.16044
备注:EMNLP 2024 (Findings)

[24] Surprise! Uniform Information Density Isn't the Whole Story: Predicting Surprisal Contours in Long-form Discourse[cs.CL]
标题:惊喜!均匀信息密度并非全部:在长篇话语中预测惊讶度轮廓
作者:Eleftheria Tsipidi, Franz Nowak, Ryan Cotterell, Ethan Wilcox, Mario Giulianelli, Alex Warstadt
链接:http://arxiv.org/abs/2410.16062
备注:EMNLP 2024 (main conference)

[25] Analysing the Residual Stream of Language Models Under Knowledge Conflicts[cs.CL]
标题:分析知识冲突下语言模型残余流
作者:Yu Zhao, Xiaotang Du, Giwon Hong, Aryo Pradipta Gema, Alessio Devoto, Hongru Wang, Xuanli He, Kam-Fai Wong, Pasquale Minervini
链接:http://arxiv.org/abs/2410.16090
备注:Foundation Model Interventions Workshop @ NeurIPS 2024

感谢arxiv.org


Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/175249
 
216 次点击