Data Privacy in AI-Copilot

At SGS Digicomply, we are committed to transparently explaining how we utilize Generative Artificial Intelligence (AI) in our AI-Copilot to enhance your research and insights while prioritizing your data privacy and security.

This document outlines our current and potential future AI applications and the measures we take to protect your information, including our use of a private, in-house Large Language Model (LLM) for private data and third-party providers for public data.

Our Commitment to You

Regardless of how AI technology evolves, we are rigid in our commitment to:

Always prioritize the privacy and security of your data.
Utilize our private, in-house LLM for all private customer data, ensuring it remains within the SGS Digicomply cloud.
Utilize third-party providers exclusively for public data, solely for the purpose of fulfilling your requests related to that data.
Never transmit private customer data to third-party providers, nor in a manner that could identify our clients.
Never train our models, nor permit third-party providers to train their models, on customer data.
Inform you of changes in our use of third-party providers, if applicable.
Maintain the option for you to opt out of these features.

How We Use Customer Data

Private Data:
- Definition: All data generated directly by the user, including every query, comment, and document uploaded to the system.
- Handling: We utilize our own instance of an open-source LLM hosted securely within the SGS Digicomply cloud. This ensures your private data never leaves our controlled environment. Our private LLM processes your inputs to provide accurate responses while maintaining complete data privacy.
Public Data:
- Definition: Publicly available information or data explicitly designated as public.
- Handling: We may utilize trusted third-party LLM providers. This allows us to leverage external capabilities for broader data processing when privacy concerns are mitigated by the public nature of the data.

Which Third-Party Providers Do We Use?

Our AI infrastructure is built on distinct layers:

Google Cloud Vertex AI

- Vertex AI is Google Cloud’s AI platform for building and deploying machine learning models at scale. For features that involve user queries - including private data - we use a version of Google Cloud’s AI that is deployed within our own dedicated cloud environment. Because it runs inside our environment, this data remains under our control and is not accessible to Google.
Private open-source LLM (fallback for private data)
- We also operate a private open-source LLM hosted entirely within our own infrastructure. This model is used for private data processing and additionally serves as a fallback for private data in the event that Google Cloud is unavailable. At no point does this data leave our controlled environment.

OpenAI (fallback for public data only)

OpenAI is used exclusively as a fallback for processing public data in cases where Google Cloud is unavailable. Private customer data - including user queries and uploaded documents - is never routed through OpenAI.

Custom fine-tuned Gemini models (hosted in Google Cloud)

These are models we have fine-tuned ourselves for large-scale extraction tasks. They are trained exclusively using proprietary datasets created by our own SGS subject-matter experts - never on client data or user inputs.

Custom fine-tuned open-source models (hosted in Google Cloud)

Similarly, these are only trained with proprietary datasets created by our own SGS subject-matter experts and are hosted within our Google Cloud environment.

In all cases, we do not permit any of these providers to use data processed on our behalf for training their models.

How do you improve the service if no client data is used in training?

We continuously improve our service based on how it's used - not on what clients submit. Feedback such as support tickets and reported issues helps us identify areas that need attention. For example, if users report that product categories are being misclassified, our team investigates the root cause. From there, we build targeted training datasets using publicly available data and the expertise of our own SGS subject-matter experts, whose annotations guide the model improvements. At no point is client input data - including user queries or uploaded documents - used as part of any training set.

Discover our new onboarding platform full of video guides - SGS Academy!

Data Privacy in AI-Copilot

How SGS Digicomply utilizes Generative AI while prioritizing your security.

Our Commitment to You

How We Use Customer Data

Which Third-Party Providers Do We Use?

How do you improve the service if no client data is used in training?