Launching Synthetic Data Studio within a project
You can launch Synthetic Data Studio on the Cloudera AI Platform to generate datasets and evaluate them.
Agent Studio integrates with two major enterprise inference services:
-
Cloudera AI Inference Service: It offers enterprise-grade deployment options.
To enable Cloudera AI Inference service for Synthetic Data Studio, ensure the followings:- The environment variable responsible for enabling Cloudera AI Inference service
is
CDP_TOKEN
. By defaultCDP_TOKEN
is set tonull
. If left asnull
, the application will use the JWT stored at/tmp/jwt
to run Cloudera AI Inference service. Alternatively, if you provide a value forCDP_TOKEN
during the pre-installation configuration of environment variables, it will override the default and be used for authentication. - Ensure that the Cloudera AI Inference service endpoints and model IDs are readily available. You will be prompted to provide these details if you choose Cloudera AI Inference service as the AI inference option in Synthetic Data Studio (SDS).
- All endpoints used must conform to the OpenAI API standard.
- For Cloudera AI
on premises, you must use
CDP_TOKEN
for authentication. Auto-generated tokens stored in/tmp/jwt/
are not yet available in the Cloudera AI on premises version.
For more details, see Authenticating Cloudera AI Inference service.
- The environment variable responsible for enabling Cloudera AI Inference service
is
- AWS Bedrock: It provides scalable cloud-based inference.
Environment Variables: Before installation, Synthetic Data Studio must be
configured with the necessary environment variables -
CDP_TOKEN-
to enable the
Cloudera AI Inference service.AWS_DEFAULT_REGION:
Defaults to the us-east-1 region.*AWS_ACCESS_KEY_ID:
Your AWS access key ID.*AWS_SECRET_ACCESS_KEY:
Your AWS secret access key.*Hf_token:
Your Hugging Face token for exporting datasets.Hf_username:
Your Hugging Face username.CDP_TOKEN:
Overrides the JWT token for Cloudera AI Inference service.