We introduce a new reasoning-based multimodal guardrail model termed GuardReasoner-Omni that moderates text, image, and video content by training on a comprehensive multimodal corpus and outperforming prior baselines across guardrail benchmarks.
Kimi K2.5 presents an open, multi-modal, agentic LLM that achieves strong general intelligence in agentic search, coding, and multi-modal tasks. My contribution is leading to improve extremely long-horizon coding ability, e.g., PaperBench, by data scaling, SFT, and RL.
Terminal-Bench introduces a realistic command-line benchmark showing that current AI agents still struggle with complex, long-horizon CLI workflows.
My contributions focus on proposing and implementing the tasks in Terminal-Bench.
Kimi K2 presents an open, agentic LLM that achieves strong general intelligence in coding, tool-use, and math. My contributions focus on improving agentic coding ability, including leading PaperBench evaluation and engaging in Docker scaling and scaffold scaling.
We propose a new VLM safeguard termed GuardReasoner-VL by incentivize the guard model to deliberatively reason before making moderation decisions via online RL. Experiments on 14 multi-modal benchmarks demonstrate the superiority.
We propose a reasoning-based meta-agent termed FlowReasoner to automate the design of query-level multi-agent systems, i.e., one systems per query, using distillation and reinforcement learning from external execution feedback.
We conduct a comprehensive survey on efficient inference for large reasoning models (LRMs). We categorize the existing methods into two main categories explicit compact CoT and implicit latent CoT. We summarize the challenges and highlight further improvement.
We propose a new LLM safeguard termed GuardReasoner by guiding it to learn to reason. It improves the reasoning ability, explainability, and generalizability via Reasoning SFT and Hard-Sample DPO. Experiments on 13 benchmarks of 3 guardrail tasks demonstrate the superiority. The data, code, and models (1B, 3B, 8B) are released.
We propose a simple yet effective jailbreak attack termed FlipAttack against black-box LLMs within only 1 query. By analyzing LLMs' understanding mechanism, we design 4 flipping modes to disguise the attack. Then, we guide LLMs understand and execute the harmful behaivors. Experiments on 8 LLMs and 5 guards demonstrate the superiority.
We propose an unsupervised group recommendation method named ITR first to identify user groups and then conduct self-supervised group recommendation via two pre-text tasks. Results on both open data and industrial data show the effectiveness.
We propose an intent learning method termed ELCRec, which leverages end-to-end learnable clustering and cluster-assisted contrastive learning to improve recommendation. Both the results on open benchmarks and industrial engines demonstrate the superiority.
We explore deep-in reasons of representation collapse in deep graph clustering and improve the dual correlation reduction network with the affinity recovery strategy.
We aim to extend deep graph clustering to temporal graphs, which are more practical in real-world scenarios. We propose a general framework TGC by clustering distribution assignment and adjacency reconstruction.
We explore at which training stage code data can help LLMs reason. The extensive experiments and insights deepen our understanding of LLMs' reasoning capability and the corresponding applications, e.g., scientific question answering, legal support, etc.
We show that the promising performance of deep graph clustering methods relies on the pre-defined cluster number and propose RGC to determine the cluster number via reinforcement learning.
We propose a plug-and-play knowledge graph contrastive learning method named KGE-SymCL by mining the symmetrical structure information in knowledge graphs.
We analyze the drawbacks of existing deep graph clustering methods and scale deep graph clustering to large-scale graphs. The proposed shrink and dilation loss functions optimize clustering distribution adversarially, allowing batch training without performance dropping.
We propose to replace the complicated and consuming graph data augmentations by designing parameter un-shared Siamese encoders and perturbing node embeddings.
We propose a Hard Sample Aware Network (HSAN) to mine both the hard positive samples and hard negative samples with a comprehensive similarity measure criterion and a general dynamic sample weighing strategy.
Deep graph clustering, which aims to group the nodes in graph into disjoint clusters, has become a new hot research spot. This paper summarizes the taxonomy, challenge, and application of deep graph clustering. We hope this work will serve as a quick guide and help researchers to overcome the challenges in this field.
We propose a self-supervised deep graph clustering method termed Dual Correlation Reduction Network (DCRN) to address the representation collapse issue by reducing information correlation in both sample and feature levels.
"If we knew what it was we were doing, it would not be called research, would it?"                                                     --Albert Einstein