Profiler data testing

You must note the important information about profiler services.

Test data for Compute Cluster enabled environments

The following dataset has been validated and works as expected for Compute Cluster enabled environments.

  • Supported asset types:
    External Tables:
    • CSV
    • Parquet
    • Iceberg
    • Avro
    • ORC
    Managed Tables:
    • CSV
    • Parquet
    • Iceberg
    • Avro
    • ORC
  • Scheduled profiling results:
    • Total data tested:
      • Total of 5 TB of data
      • Total table count: 7700
    • Tested tables:
      • 600 Parquet tables
      • 6500 CSV tables
      • 300 Iceberg tables
      • 150 Avro tables
      • 150 ORC tables
  • On-demand profiling results:
    • Parquet table: 250 GB, sample (17 GB)
      • Statistics Collector profiler:
        • 30 Executors / 9 mins
        • 20 executors / 14 mins
        • 10 executors / 30 mins
      • Data Compliance profiler:
        • 30 Executors / 13 mins
        • 20 executors / 18 mins
        • 10 executors / 35 mins
    • CSV table: 430 GB, sample (28 GB)
      • Statistics Collector profiler:
        • 30 Executors / 23 mins
        • 20 executors / 40 mins
        • 10 executors / 86 mins
      • Data Compliance profiler:
        • 30 Executors / 42 mins
        • 20 executors / 63 mins
        • 10 executors / 70 mins