Key Features of Synthetic Data Studio

Learn the key features of Synthetic Data Studio and how they enable scalable synthetic data generation for advanced AI use cases.

Supervised Fine-Tuning Dataset Generation
- Automatically generates high-quality prompt-completion pairs from raw or redacted documents by creating question-answer sets or summaries.
- Ideal for fine-tuning machine learning models, especially for unstructured content or when data availability is limited.
- Supports the creation of task-specific datasets tailored to enterprise needs.
Document-Based Generation
- Advanced document processing capabilities allow users to generate synthetic data directly from uploaded document collections.
- Ensures the creation of domain-specific datasets that are consistent with existing documentation and knowledge bases.
Custom Generation Workflows
- Offers a flexible workflow system to process user-provided inputs and create tailored data generation pipelines.
- Guarantees that the generated data aligns perfectly with specific use cases and enterprise requirements.
Evaluation Workflow
- Supports the evaluation of generated supervised fine-tuning (SFT) datasets using an LLM-as-a-judge.
- Provides evaluation scores and detailed justifications for each generated prompt-completion pair.
- Allows you to define custom scoring criteria and justification parameters for precise quality assessments.
- Ensures datasets meet the highest quality standards before being used for fine-tuning.
Export Functionality
- Enables datasets to be exported to the Project File System for easy access.
- Supports seamless uploading of datasets to Hugging Face for broader accessibility and further model training.