DeepSeek’s New AI Model May Be Trained on Google’s Gemini: Developer Raises Eyebrows Over Similarities

In a curious twist in the rapidly evolving world of artificial intelligence, DeepSeek’s latest AI model, R1-0528, has come under scrutiny for displaying notable similarities with Google’s Gemini 2.5 Pro, raising questions about its training data and potential influence from proprietary models.

The observation was first made public by developer Sam Paech, who shared a detailed analysis highlighting behavioral and output-based parallels between DeepSeek’s R1-0528 model and Google’s Gemini 2.5 Pro. Paech’s assessment—circulated across developer forums and AI research communities—suggests that the performance, language generation patterns, and even certain contextual cues exhibited by R1-0528 bear a striking resemblance to Google’s Gemini.

Table of Contents

Similarities Fuel Speculation

According to Paech, R1-0528 appears to respond to complex queries in a manner nearly indistinguishable from Gemini 2.5 Pro, especially in areas like coding assistance, logical reasoning, and multilingual responses. In side-by-side comparisons, both models produced nearly identical outputs on a variety of prompts—ranging from mathematical queries to sentiment analysis and advanced programming tasks.

While it’s not uncommon for models to converge on certain answers due to overlapping datasets or training strategies, the degree of alignment between the two has led to speculation that DeepSeek’s model may have been trained using data influenced—directly or indirectly—by Gemini outputs.

Questions Around Model Training Practices

The similarities have reignited broader conversations around AI model training transparency, data sourcing ethics, and the use of synthetic content generated by proprietary models. If DeepSeek’s R1-0528 was in fact trained on data produced by Gemini or any other closed-source system, it could raise intellectual property concerns and spark debate over the boundaries of permissible data use in AI development.

While there is no official evidence yet confirming any direct use of Gemini-generated data, the situation underscores the challenges of auditing black-box AI systems, especially when model weights, training datasets, and fine-tuning methods are not publicly disclosed.

DeepSeek’s Silence Raises More Questions

As of now, DeepSeek has not released an official statement addressing the claims or providing details about the training methodology of R1-0528. The company’s silence has only added to the curiosity among AI researchers and developers, many of whom are calling for more transparency and accountability in the development of large language models (LLMs).

The issue highlights a growing concern within the tech industry: as AI models become more powerful and accessible, the lines between original research, derivative works, and training on synthetic content continue to blur.

Industry Implications

If the similarities are indeed more than coincidental, it could have legal and reputational implications for DeepSeek and raise further questions about the reuse of model-generated content in training data pipelines. It also spotlights the need for clearer standards and guidelines around model training, especially as open-source and commercial AI tools proliferate.

Conclusion

While more evidence is needed to confirm whether DeepSeek’s R1-0528 was trained on or influenced by Google’s Gemini 2.5 Pro, the growing scrutiny points to a deeper need for transparency in AI model development. As the AI race accelerates, responsible innovation and ethical practices will be essential to maintaining trust and fairness in the ecosystem.