FacebookTwitterLinkedIn

DeepSeek Exposes Over 1 Million Chat Records

Last week, the world was stunned by the performance offered by DeepSeek's R1 Large Language Model, and for a fraction of the cost, it took Open AI and others to develop a model. Along with news regarding how DeepSeek impacted US markets and its possible "Sputnik" moment, DeepSeek also made cybersecurity headlines for the wrong reasons.

DeepSeek Exposes Over 1 Million Chat Records

Wiz Research discovered a publicly accessible database belonging to DeepSeek that allowed complete control over database operations, including the ability to access internal data. The exposed database includes over a million lines of log streams with highly sensitive information, including chat history, secret keys, backend details, and other sensitive information.

In a recent blog article by Wiz Research, the database was discovered soon after R1 was released to the public.

Researchers stated,

Within minutes, we found a publicly accessible ClickHouse database linked to DeepSeek, completely open and unauthenticated, exposing sensitive data. It was hosted at oauth2callback.deepseek.com:9000 and dev.deepseek.com:9000…This database contained a significant volume of chat history, backend data and sensitive information, including log streams, API Secrets, and operational details…More critically, the exposure allowed for full database control and potential privilege escalation within the DeepSeek environment, without any authentication or defense mechanism to the outside world.

If a threat actor could compromise the database and grant themselves higher privileges, they could cause a lot of damage to DeepSeek's infrastructure and possibly have access to network-connected services that are not typically publicly accessible. In the blog, researchers explained how the exposure was initially discovered.

Researchers first analyzed DeepSeek's publicly accessible domains, which revealed 30 internet-facing subdomains. Most appeared benign, hosting elements like the chatbot interface, status page, and API documentation. None of these would initially suggest a high-risk database exposure.

Soon, two suspicious internet-facing assets were discovered on open TCP ports, 8123 and 9000. These were traced back to possible ClickHouse exposed databases, and without much digging required, they were found to be accessible without any authentication checks.

ClickHouse is an open-source, columnar database management system primarily used for fast analytical queries on large datasets. Developed by Yandex, it is widely used for real-time data processing, log storage, and big data analytics, so if exposed, it will likely allow a threat actor access to very sensitive information.

As to the level of access to sensitive information relating to DeepSeek, researchers stated,

This level of access posed a critical risk to DeepSeek's own security and for its end-users. Not only an attacker could retrieve sensitive logs and actual plain-text chat messages, but they could also potentially exfiltrate plaintext passwords and local files along propriety information directly from the server using queries like: SELECT * FROM file('filename') depending on their ClickHouse configuration.

Rapid Adoption Security Risks

The rapid adoption of GenAI technologies has left security as an afterthought, much to the detriment of the technology as a whole. Overall, we have given far too much attention to the possibility of future threats AI will empower. However, we see the impact that more mundane threats, like poorly configured databases, can have today.

Wiz Research concluded,

The world has never seen a piece of technology adopted at the pace of AI. Many AI companies have rapidly grown into critical infrastructure providers without the security frameworks that typically accompany such widespread adoptions. As AI becomes deeply integrated into businesses worldwide, the industry must recognize the risks of handling sensitive data and enforce security practices on par with those required for public cloud providers and major infrastructure providers.

Several AI security best practices have been developed after this rapid adoption to address security concerns and the types of attacks we already see. These practices include customizing GenAI services like DeepSeek or ChatGPT to enhance security, which can involve designing models with built-in security features such as access controls, anomaly detection, and automated threat response mechanisms.

Organizations can also harden AI models by implementing techniques such as adversarial training, where models are trained with both standard and adversarial examples, which can enhance their robustness. Defensive measures like input validation and anomaly detection can also help protect AI models from adversarial attacks. These strategies ensure that the models are less susceptible to manipulation, maintaining their accuracy and reliability in real-world applications.

Further, developers can incorporate input sanitization for prompt handling. This involves validating and cleaning all data inputs to ensure they are free from harmful elements that could exploit vulnerabilities in the AI models.To better supplement this, developers can establish strict data validation protocols and use tools to sanitize inputs before AI models process them. This helps prevent injection attacks and other malicious activities.

Ultimately, safer AI will come from combining technology with human expertise. Human experts are needed to make strategic decisions and address complex scenarios that require nuanced understanding and contextual knowledge, skills far beyond AI's current capabilities.

Experts believe,

AI can handle the heavy lifting of data analysis and threat detection, while human professionals apply critical thinking and experience to manage incidents and refine security protocols. Integrating AI and human expertise also enhances the adaptability and resilience of cybersecurity efforts. AI can continuously learn and adapt to new threats through machine learning algorithms, but these systems still need human oversight to ensure their outputs remain accurate and relevant.


▼ Show Discussion

About the author:

Karolis Liucveikis

Karolis Liucveikis - experienced software engineer, passionate about behavioral analysis of malicious apps.

Author and general operator of PCrisk's "Removal Guides" section. Co-researcher working alongside Tomas to discover the latest threats and global trends in the cyber security world. Karolis has experience of over five years working in this branch. He attended KTU University and graduated with a degree in Software Development in 2017. Extremely passionate about technical aspects and behavior of various malicious applications. Contact Karolis Liucveikis.

PCrisk security portal is brought by a company RCS LT. Joined forces of security researchers help educate computer users about the latest online security threats. More information about the company RCS LT.

Our malware removal guides are free. However, if you want to support us you can send us a donation.

About PCrisk

PCrisk is a cyber security portal, informing Internet users about the latest digital threats. Our content is provided by security experts and professional malware researchers. Read more about us.

Malware activity

Global malware activity level today:

Medium threat activity

Increased attack rate of infections detected within the last 24 hours.

Virus and malware removal

This page provides information on how to avoid infections by malware or viruses and is useful if your system suffers from common spyware and malware attacks.

Learn about malware removal