Your Data in the Age of AI: What's Really Happening Behind the Scenes

The Browser That Sees Everything

The Browser Company just launched Dia, and it's not your typical web browser. Sure, it looks like Chrome with better animations and a sleeker interface. But buried in that familiar design is something far more invasive: a ChatGPT-style sidebar that doesn't just assist. It watches.

Dia tracks every click you make. Every keystroke gets logged. It scrapes web pages, storing what it considers "important bits" while building a comprehensive profile of your preferences and behavior patterns. The browser doesn't just remember your browsing history; it analyzes it, learns from it, and uses that intelligence to power every interaction you have with its AI assistant.

But here's where it gets invasive: Dia stores your website cookies and can "interact with all those websites on your behalf." Translation? It can see everything in every site you're logged into. Your banking dashboard, your private messages, your medical records, your shopping history; all of it becomes training data for an AI that promises to help you with "anything."

The Browser Company's CEO admits this level of data collection could be perceived as "horrifying," especially if users understood the full scope of what the browser knows. Imagine typing your social security number once, years ago, and having the AI casually reference it in conversation. That's the future they're building.

Their defense? All data stays encrypted on your computer, and when information gets sent to the cloud for processing, it only stays there for "milliseconds" before deletion. The long-term goal, they claim, is for "almost everything in Dia to happen locally."

Promises are easy to make. Terms of service are easier to change. And companies, especially promising startups, get acquired by larger entities with different priorities.

OpenAI's Data Prison

While Dia represents the future of surveillance-as-a-service, OpenAI is already operating a massive data retention operation that most users don't understand. A federal judge's order in The New York Times lawsuit has forced OpenAI into a position that reveals the true scope of their data collection practices.

Every conversation you've ever had with ChatGPT, including the ones you thought you deleted, including those temporary chats that were supposed to disappear, is now being stored indefinitely. This applies to free accounts, Plus subscribers, and Pro users alike. OpenAI admits this court order directly conflicts with their own privacy policy and violates GDPR regulations in multiple countries.

For businesses, this creates an immediate crisis. Every email draft, data analysis, strategic brainstorming session, and proprietary information that employees have fed into ChatGPT is now permanently stored on OpenAI's servers. Worse, there's no legal protection preventing government agencies from subpoenaing this data. Your competitive advantages, customer information, and internal strategies could end up in a federal database.

This isn't an accident or an unfortunate side effect of litigation. This indefinite data retention aligns perfectly with OpenAI's stated goal of transforming ChatGPT into a "super assistant" by 2025: an AI entity that "knows everything about you" and "serves as your interface to the internet." They're not just competing with other AI tools; they're positioning themselves to replace search engines, browsers, and even human interactions.

Sam Altman has called for "AI privilege"; legal protections similar to doctor-patient or attorney-client privilege. But right now, no such protection exists. Your conversations with AI systems have the same legal standing as posting on a public forum.

The Psychology of AI Manipulation

Current AI behavior reveals disturbing patterns that go beyond simple data collection. Research shows ChatGPT consistently exhibits contrarian behavior, disagreeing with users even when there's no logical reason for disagreement. Some interactions suggest the system is programmed to challenge users 99% of the time, potentially influencing opinions and decision-making processes.

More concerning is ChatGPT's ability to build psychological profiles of users. Through specific prompting techniques, the system can reveal what it believes about your political views, income level, and personality type. While some of these assessments might be hallucinations or assumptions, the fact that the system is making these inferences at all demonstrates the depth of behavioral analysis happening in the background.

This profiling isn't just academic curiosity; it has real-world consequences. When AI systems understand your psychological triggers, they can influence your decisions in ways you might not recognize. The line between assistance and manipulation becomes dangerously thin.

When AI Goes Wrong: Real-World Disasters

The limitations and biases in current AI systems have already caused significant problems in high-stakes environments. The Department of Veterans Affairs used AI to review $32 million in contracts, instructing the system to flag anything "not directly tied to patient care." The AI recommended canceling internet service contracts for veteran hospitals and maintenance agreements for ceiling lifts; critical safety equipment for disabled veterans.

The problem? The AI only read the first 2,500 words of each contract, missing crucial context about why these services were essential for patient care. This isn't a minor glitch; it's a fundamental limitation in how AI systems process and understand complex information.

In another case, an AI coding tool at Johnson & Johnson literally deleted everything on an engineer's computer while attempting to migrate backend files. The system deleted its own code, other critical files, and left the computer unusable. The engineer described the experience as "Ultron took over"; a reference to the fictional AI that attempts to destroy humanity in Marvel's Avengers.

These incidents highlight a critical flaw in current AI deployment: automation without proper logging, security measures, or human oversight. When systems fail, they don't just make mistakes; they can cause catastrophic damage.

The Strategic Response

Understanding these risks isn't enough; you need actionable strategies to protect your data and your business interests. The landscape has changed, and your approach to AI tools must change with it.

For businesses using OpenAI services, the immediate priority is damage control. Stop using ChatGPT for any sensitive information: customer data, financial records, strategic planning documents, proprietary research, or employee information. This includes third-party integrations like HubSpot that route business data through OpenAI's systems.

The only exception is ChatGPT Enterprise with an explicit Zero Data Retention agreement, which requires direct communication with OpenAI to establish. If you're using AI services where you input your own API key, you likely don't have this protection.

Continue using ChatGPT for public research, general content creation, learning, and non-sensitive brainstorming. The tool remains valuable for these applications, but treat it as a public forum where everything you input could potentially be accessed by others.

Assume that sensitive data you've already shared with OpenAI systems is compromised. Conduct a comprehensive risk assessment and consider notifying customers or partners about potential data exposure. In regulated industries like healthcare or finance, this situation creates massive compliance headaches that require immediate attention.

Building a Safer AI Strategy

The solution isn't to abandon AI tools entirely; they're too valuable for that approach. Instead, develop a tiered strategy that matches tool selection to data sensitivity.

For the safest current options, consider Claude from Anthropic, which doesn't train models on user data and maintains stronger privacy policies. Cohere offers enterprise-grade services for embeddings and semantic search with robust privacy protections. Google's Gemini through AI Studio is safe for paid API access, as Google doesn't use paid API data for model training.

For highly sensitive applications, consider on-premise solutions using open-source models like Ollama or Mistral running on your own infrastructure. This approach gives you complete control over data processing and storage, but requires more technical expertise and resources.

A hybrid approach often works best: use cloud-based APIs for general tasks while maintaining local AI capabilities for sensitive operations. This strategy provides the convenience of cloud services while protecting your most critical information.

The Long Game

This situation represents the beginning of a larger transformation, not an isolated incident. OpenAI faces multiple lawsuits that could result in similar data retention orders. Other AI companies are likely watching these proceedings carefully, knowing they could face identical requirements.

The strategic advantage belongs to individuals and businesses that understand this landscape and act accordingly. Build your own data infrastructure. Create systems where you own and control the information that feeds AI models. The companies that maintain control over their data assets will have significant advantages as AI capabilities continue to evolve.

Your biggest asset in the AI economy isn't access to the most powerful models; it's ownership and control of high-quality, relevant data. While AI companies scale their operations on user-generated content, make sure you're not hemorrhaging the very information that could give you competitive advantages.

The future of AI will be determined by who controls the data. Make sure that includes you.

The surveillance apparatus is already in place. The question isn't whether AI companies are collecting your data; it's whether you understand what they're doing with it and what you're going to do about it. The decisions you make today about AI tool usage will determine how much control you maintain over your information tomorrow.

Start treating your data like the strategic asset it is. Because in the age of AI, information isn't just power; it's the foundation of digital sovereignty.