MV-RAG: Retrieval Augmented Multiview Diffusion
Authors: Yosef Dayani, Omer Benishu, Sagie Benaim
First: 2025-08-22T17:59:40+00:00 · Latest: 2025-08-22T17:59:40+00:00
Comments: Project page: https://yosefdayani.github.io/MV-RAG
Abstract
Text-to-3D generation approaches have advanced significantly by leveraging
pretrained 2D diffusion priors, producing high-quality and 3D-consistent
outputs. However, they often fail to produce out-of-domain (OOD) or rare
concepts, yielding inconsistent or inaccurate results. To this end, we propose
MV-RAG, a novel text-to-3D pipeline that first retrieves relevant 2D images
from a large in-the-wild 2D database and then conditions a multiview diffusion
model on these images to synthesize consistent and accurate multiview outputs.
Training such a retrieval-conditioned model is achieved via a novel hybrid
strategy bridging structured multiview data and diverse 2D image collections.
This involves training on multiview data using augmented conditioning views
that simulate retrieval variance for view-specific reconstruction, alongside
training on sets of retrieved real-world 2D images using a distinctive held-out
view prediction objective: the model predicts the held-out view from the other
views to infer 3D consistency from 2D data. To facilitate a rigorous OOD
evaluation, we introduce a new collection of challenging OOD prompts.
Experiments against state-of-the-art text-to-3D, image-to-3D, and
personalization baselines show that our approach significantly improves 3D
consistency, photorealism, and text adherence for OOD/rare concepts, while
maintaining competitive performance on standard benchmarks.
Summary / 总结
Text-to-3D generation approaches have advanced significantly by leveraging pretrained 2D diffusion priors, producing high-quality and 3D-consistent outputs.
Benchmarking Training Paradigms, Dataset Composition, and Model Scaling for Child ASR in ESPnet
Authors: Anyu Ying, Natarajan Balaji Shankar, Chyi-Jiunn Lin, Mohan Shi, Pu Wang, Hye-jin Shim, Siddhant Arora, Hugo Van hamme, Abeer Alwan, Shinji Watanabe
First: 2025-08-22T17:59:35+00:00 · Latest: 2025-08-22T17:59:35+00:00
Comments: 5 pages, 3 figures, presented at WOCCI 2025 (Workshop on Child
Computer Interaction), satellite workshop of Interspeech 2025
Abstract
Despite advancements in ASR, child speech recognition remains challenging due
to acoustic variability and limited annotated data. While fine-tuning adult ASR
models on child speech is common, comparisons with flat-start training remain
underexplored. We compare flat-start training across multiple datasets, SSL
representations (WavLM, XEUS), and decoder architectures. Our results show that
SSL representations are biased toward adult speech, with flat-start training on
child speech mitigating these biases. We also analyze model scaling, finding
consistent improvements up to 1B parameters, beyond which performance plateaus.
Additionally, age-related ASR and speaker verification analysis highlights the
limitations of proprietary models like Whisper, emphasizing the need for
open-data models for reliable child speech research. All investigations are
conducted using ESPnet, and our publicly available benchmark provides insights
into training strategies for robust child speech processing.
Summary / 总结
Despite advancements in ASR, child speech recognition remains challenging due to acoustic variability and limited annotated data.
Hierarchical Decision-Making for Autonomous Navigation: Integrating Deep Reinforcement Learning and Fuzzy Logic in Four-Wheel Independent Steering and Driving Systems
Authors: Yizhi Wang, Degang Xu, Yongfang Xie, Shuzhong Tan, Xianan Zhou, Peng Chen
First: 2025-08-22T17:57:56+00:00 · Latest: 2025-08-22T17:57:56+00:00
Abstract
This paper presents a hierarchical decision-making framework for autonomous
navigation in four-wheel independent steering and driving (4WISD) systems. The
proposed approach integrates deep reinforcement learning (DRL) for high-level
navigation with fuzzy logic for low-level control to ensure both task
performance and physical feasibility. The DRL agent generates global motion
commands, while the fuzzy logic controller enforces kinematic constraints to
prevent mechanical strain and wheel slippage. Simulation experiments
demonstrate that the proposed framework outperforms traditional navigation
methods, offering enhanced training efficiency and stability and mitigating
erratic behaviors compared to purely DRL-based solutions. Real-world
validations further confirm the framework's ability to navigate safely and
effectively in dynamic industrial settings. Overall, this work provides a
scalable and reliable solution for deploying 4WISD mobile robots in complex,
real-world scenarios.
Summary / 总结
This paper presents a hierarchical decision-making framework for autonomous navigation in four-wheel independent steering and driving (4WISD) systems.
Are LLM-Powered Social Media Bots Realistic?
Authors: Lynnette Hui Xian Ng, Kathleen M. Carley
First: 2025-08-01T18:06:13+00:00 · Latest: 2025-08-22T17:56:26+00:00
Comments: Accepted into SBP-BRiMS 2025
Abstract
As Large Language Models (LLMs) become more sophisticated, there is a
possibility to harness LLMs to power social media bots. This work investigates
the realism of generating LLM-Powered social media bot networks. Through a
combination of manual effort, network science and LLMs, we create synthetic bot
agent personas, their tweets and their interactions, thereby simulating social
media networks. We compare the generated networks against empirical bot/human
data, observing that both network and linguistic properties of LLM-Powered Bots
differ from Wild Bots/Humans. This has implications towards the detection and
effectiveness of LLM-Powered Bots.
Summary / 总结
As Large Language Models (LLMs) become more sophisticated, there is a possibility to harness LLMs to power social media bots.
LLM-Based Agents for Competitive Landscape Mapping in Drug Asset Due Diligence
Authors: Alisa Vinogradova, Vlad Vinogradov, Dmitrii Radkevich, Ilya Yasny, Dmitry Kobyzev, Ivan Izmailov, Katsiaryna Yanchanka, Andrey Doronichev
First: 2025-08-22T17:50:00+00:00 · Latest: 2025-08-22T17:50:00+00:00
Abstract
In this paper, we describe and benchmark a competitor-discovery component
used within an agentic AI system for fast drug asset due diligence. A
competitor-discovery AI agent, given an indication, retrieves all drugs
comprising the competitive landscape of that indication and extracts canonical
attributes for these drugs. The competitor definition is investor-specific, and
data is paywalled/licensed, fragmented across registries, ontology-mismatched
by indication, alias-heavy for drug names, multimodal, and rapidly changing.
Although considered the best tool for this problem, the current LLM-based AI
systems aren't capable of reliably retrieving all competing drug names, and
there is no accepted public benchmark for this task. To address the lack of
evaluation, we use LLM-based agents to transform five years of multi-modal,
unstructured diligence memos from a private biotech VC fund into a structured
evaluation corpus mapping indications to competitor drugs with normalized
attributes. We also introduce a competitor validating LLM-as-a-judge agent that
filters out false positives from the list of predicted competitors to maximize
precision and suppress hallucinations. On this benchmark, our
competitor-discovery agent achieves 83% recall, exceeding OpenAI Deep Research
(65%) and Perplexity Labs (60%). The system is deployed in production with
enterprise users; in a case study with a biotech VC investment fund, analyst
turnaround time dropped from 2.5 days to $\sim$3 hours ($\sim$20x) for the
competitive analysis.
Summary / 总结
In this paper, we describe and benchmark a competitor-discovery component used within an agentic AI system for fast drug asset due diligence.