Security Best Practices for AI-Powered Data Analysis

Using AI to analyze your data is convenient. It's also potentially risky if not done right. We get a lot of questions about security, so here's a straightforward explanation of how dataTamer handles your data and what you should watch out for in general.

What actually gets sent to AI models

When you ask a question, we send three things to the LLM: your question, metadata about your database schema (table names, column names, data types), and sometimes sample rows of data.

We don't dump your entire database to OpenAI or Anthropic. That would be absurd and expensive. Only the specific information needed to answer your query gets sent.

For document analysis, the relevant chunks of text from your PDFs get sent as context. Not whole documents every time – just the parts that match your question.

Encryption in transit and at rest

Everything is encrypted when moving between your browser, our servers, and the AI providers. Standard TLS, same as your bank uses.

Stored data (your credentials, uploaded documents, connection settings) is encrypted at rest using AES-256. If someone somehow got access to our database servers, they'd just see encrypted blobs.

This is table stakes for any serious data tool, but worth stating explicitly.

Database credentials

Your database passwords are encrypted and never sent to AI models. They're only used to establish direct connections from our servers to your database.

We support SSH tunneling if your database isn't publicly accessible. This means we connect through a secure intermediary instead of requiring you to expose your database to the internet.

You can also use read-only database credentials. If you're nervous about granting write access (which is totally reasonable), create a user with only SELECT permissions.

What the AI providers see

When we send queries to OpenAI, Anthropic, or Grok, those providers see the data we send. That's unavoidable – they need context to generate answers.

However, all three providers we work with have agreements not to train their models on customer data. Your queries aren't used to improve GPT or Claude for everyone else.

This is covered in their business/API terms of service. It's different from using ChatGPT or Claude directly through their consumer products.

Data retention

We store your conversation history so you can review past queries. This is kept encrypted on our servers.

You can delete individual conversations or your entire history any time. Once deleted from our system, it's gone – we don't keep backups of deleted user data beyond standard backup retention windows (typically 30 days).

The AI providers keep API request logs for a limited time (usually 30 days) for debugging and abuse prevention. After that, they're deleted on their end too.

What you should be careful about

Don't paste raw sensitive data into queries

If you're tempted to copy/paste credit card numbers, social security numbers, or other PII directly into a query – don't.

Instead, reference them by ID or use your database's built-in privacy features (like masking). Ask "Show me transactions for user_id 12345" not "Show me transactions for John Smith at [email protected]."

Be mindful of what you upload

If you upload documents containing sensitive information, they're stored on our servers (encrypted) and chunks of them get sent to AI models when relevant.

This is fine for research papers, public reports, or internal documentation. Be more careful with anything containing personal data or trade secrets.

Use appropriate access controls

Don't share dataTamer login credentials between people. Create separate accounts so you can track who's accessing what.

If someone leaves your company, disable their account immediately. Basic stuff, but people forget.

Compliance considerations

We're SOC 2 Type II compliant, which means we've been audited on our security practices. If your compliance team asks about this, that's the certification they care about.

For GDPR: we have data processing agreements available for EU customers. Your users' data can be processed and stored within the EU if required.

For HIPAA: we offer Business Associate Agreements for healthcare customers. There are additional security controls that get enabled for these accounts.

What we can't protect against

If someone steals your laptop and it's not encrypted, they might access your dataTamer account if you're logged in. Use full disk encryption and lock your screen.

If you use weak passwords or reuse passwords across services, your account could get compromised. Use a password manager and enable 2FA.

If your database itself is insecure (weak credentials, no network restrictions, unpatched vulnerabilities), that's outside our control. Secure your database regardless of what tools connect to it.

The honest assessment

Using AI with your data does introduce some risk – mainly that your data is being sent to third-party AI providers. We mitigate this with encryption, strict agreements with providers, and by minimizing what data gets sent.

But there's no such thing as zero risk. If your data is so sensitive that it can never leave your infrastructure under any circumstances, AI-powered analysis might not be appropriate yet. On-premise or self-hosted options exist but come with their own trade-offs.

For most businesses, the risk is acceptable given the security measures in place. But you have to make that call based on your specific compliance requirements and risk tolerance.

If you have specific security questions or need detailed documentation for your security team, reach out. We're happy to provide technical details about our architecture and practices.