Make The Complicated World Simpler Through Technology

Baidu’s Search Science Team is hiring! Please drop a message to search_science@baidu.com if interested.

Join Us

Research Topics

Our research topics include (but are not limited to)

Search and Ranking

Our research topics for search & rank include (but are not limited to):
  • Learning to Rank
  • Web Search Models and Ranking
  • Query Analysis
  • Text Mining for Search
  • User Behavior Modeling
  • Evaluation Methodologies and Metrics
  • Interactive Search and Result Presentation
  • Semantic Search
  • Natural Language Processing

    Our research directions for natural language processing include (but are not limited to):
  • Dialogue and Interactive Systems
  • Text Summarization
  • Information Extraction and Text Mining
  • Representation Learning
  • Natural Language Understanding
  • Machine Translation
  • Topic Modeling
  • Semantic Matching
  • Multimedia Retrieval

    Our research directions for multimedia retrieval include (but are not limited to):
  • Content Mining of Multimedia and Multimodal Web Data
  • Web-mediated Communities
  • Image and Video Retrieval
  • Image and Video Understanding
  • Object Detection and Recognition
  • Artificial Intelligence

    Our research directions for artificial intelligence include (but are not limited to):
  • Knowledge Representation
  • Crowdsourcing
  • Recommender Systems
  • Deep Metric Learning
  • Social Network

  • News

    Recent

    2024-05 "Exploring Memorization in Fine-tuned Language Models" is accepted by ACL 2024!

    2024-05 "The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG)" is accepted by ACL 2024!

    2024-03 "Knowing What LLMs DO NOT Know: A Simple Yet Effective Self-Detection Method" is accepted by NAACL 2024!

    2024-03 "A Robust Semantics-based Watermark for Large Language Model against Paraphrasing" is accepted by NAACL 2024!

    2024-03 "MILL: Mutual Verification with Large Language Models for Zero-Shot Query Expansion" is accepted by NAACL 2024!

    2023-10 "Text-Video Retrieval via Multi-Modal Hypergraph Networks" is accepted by WSDM 2024!

    2023-10 "Large Language Models for Data Aumgnetation in Recommendation" is accepted by WSDM 2024!

    2023-10 "Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent" is accepted by EMLNP 2023!

    2023-10 "DiQAD: A Benchmark Dataset for Open-domain Dialogue Quality Assessment" is accepted by EMLNP 2023!

    2023-10 "GS2P: A Generative Pre-trained Learning to Rank Model with Over-parameterization for Web-Scale Search" has received IEEE DSAA 2023 Best Paper Award!

    2023-08 "Learning to Tokenize for Generative Retrieval" is accepted by NeurIPS 2023!

    2023-08 "I^3 Retriever: Incorporating Implicit Interaction in Pre-trained Language Models for Passage Retrieval" is accepted by CIKM 2023!

    2023-05 "Boosting Event Extraction with Denoised Structure-to-Text Augmentation" is accepted by ACL 2023!

    2023-05 "Are Message Passing Neural Networks Really Helpful for Knowledge Graph Completion?" is accepted by ACL 2023!

    2023-05 Three papers are accepted by KDD 2023!

    2023-04 Two papers are accepted by SIGIR 2023!


    Recent Publication

    Qian Li, Lixin Su, Jiashu Zhao, Long Xia, Hengyi Cai, Suqi Cheng, Hengzhu Tang, Junfeng Wang, Dawei Yin. Text-Video Retrieval via Multi-Modal Hypergraph Networks. Accepted by WSDM 2024.

    Wei Wei, Xubin Ren, Jiabin Tang, Qinyong Wang, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin and Chao Huang. Large Language Models for Data Aumgnetation in Recommendation. Accepted by WSDM 2024.

    Weiwei Sun, Lingyong Yan, Xinyu Ma, Shuaiqiang Wang, Pengjie Ren, Zhumin Chen, Dawei Yin, Zhaochun Ren. Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents. Accepted by EMNLP 2023.

    Yukun Zhao* , Lingyong Yan*, Weiwei Sun, Chong Meng, Shuaiqiang Wang, Zhicong Cheng, Zhaochun Ren, Dawei Yin. DiQAD: A Benchmark Dataset for Open-domain Dialogue Quality Assessment. Accepted by EMNLP 2023 Findings. (* equal contribution)

    Weiwei Sun, Lingyong Yan, Zheng Chen, Shuaiqiang Wang, Haichao Zhu, Pengjie Ren, Zhumin Chen, Dawei Yin, Maarten de Rijke, Zhaochun Ren. Learning to Tokenize for Generative Retrieval. Accepted by NeurIPS 2023.

    Qian Dong, Yiding Liu, Qingyao Ai, Haitao Li, Shuaiqiang Wang, Yiqun Liu, Dawei Yin, Shaoping Ma. I^3 Retriever: Incorporating Implicit Interaction in Pre-trained Language Models for Passage Retrieval. In CIKM 2023.

    Yuchen Li, Haoyi Xiong, Linghe Kong, Zeyi Sun, Hongyang Chen, Shuaiqiang Wang, and Dawei Yin. MPGraf: a Modular and Pre-trained Graphformer for Learning to Rank at Web-scale. In ICDM 2023.

    Yubao Tang, Ruqing Zhang, Jiafeng Guo, Jiangui Chen, Zuowei Zhu, Shuaiqiang Wang, Dawei Yin, Xueqi Cheng. Semantic-Enhanced Differentiable Search Index Inspired by Learning Strategies. In KDD 2023 (Applied Data Science track).

    Rong Huang, Danfeng Zhang, Weixue Lu, Han Li, Meng Wang, Daiting Shi, Jun Fan, Zhicong Cheng, Simiu Gu, Dawei Yin. Learning Discrete Document Representations in Web Search. In KDD 2023 (Applied Data Science track).

    Yuchen Li, Haoyi Xiong, Linghe Kong, Qingzhong Wang, Shuaiqiang Wang, Guihai Chen, Dawei Yin. S2phere: Semi-Supervised Pre-training for Web Search over Heterogeneous Learning to Rank Data. In KDD 2023 (Applied Data Science track).

    Bo Wang, Heyan Huang, Xiaochi Wei, Ge Shi, Xiao Liu, Chong Feng, Tong Zhou, Shuaiqiang Wang, Dawei Yin. Boosting Event Extraction with Denoised Structure-to-Text Augmentation. In ACL 2023 Findings.

    Juanhui Li, Harry Shomer, Jiayuan Ding, Yiqi Wang, Yao Ma, Neil Shah, Jiliang Tang, Dawei Yin. Are Message Passing Neural Networks Really Helpful for Knowledge Graph Completion? In ACL 2023.

    Xubin Ren, Chao Huang, Lianghao Xia, Jiashu Zhao and Dawei Yin. Disentangled Contrastive Collaborative Filtering. In SIGIR 2023.

    Juanhui Li, Wei Zeng, Suqi Cheng, Yao Ma, Jiliang Tang, Shuaiqiang Wang and Dawei Yin. Graph Enhanced BERT for Query Understanding. In SIGIR 2023 (Industry Track).

    Dan Luo, Lixin Zou, Qingyao Ai, Zhiyu Chen, Dawei Yin, Brian D Davison. Model-based Unbiased Learning to Rank. In WSDM 2023.

    Yuchen Li, Haoyi Xiong, Qingzhong Wang, Linghe Kong, Hao Liu, Haifang Li, Jiang Bian, Shuaiqiang Wang, Guihai Chen, Dejing Dou, Dawei Yin. COLTR: Semi-supervised Learning to Rank with Co-training and Over-parameterization for Web Search. In TKDE 2023.

    Recruiting

    We are hiring researchers who are driven to innovate in areas on Information Retrieval, Natural Language Processing, Multimedia, Data Mining, Machine Learning and Artificial Intelligence. Please send your cv to search_science@baidu.com , if interested.

    The Search Science @Baidu is passionate on addressing theoretical and technical challenges in the dominant search engine in China, and the one of the largest AI and internet portals in the world. The team provides opportunities to innovate in a start-up mode and lets you contribute to novel projects and technologies that get deployed on baidu dot com. It is full of new challenges and opportunities for innovation. We are looking for world class, fun-loving researchers to join us!

    Job Requirements

    1. Master and PhD in Computer Science or equivalent.
    2. Experience in Information Retrieval & Web Search, Natural Language Processing, Multimedia, and other related areas.
    3. Publications at top-tier peer-reviewed conferences or journals.
    4. Proven track record of innovation in creating novel algorithms and advancing the state of the art.

    Location:

    Baidu Technology Park,
    Haidian District,
    Beijing, China

    Email:

    search_science@baidu.com

    Call:

    (+86 10)5992 8888