Cloudera Data Catalog Data Sharing Concurrent User Tests
Performance testing of Data Sharing with Cloudera Data Catalog in Cloudera on cloud 7.3.2.0 revealed stable operation with up to 100 concurrent users, while higher loads caused significant errors and timeouts. Recommendations include limiting concurrent users to 100 for optimal performance.
Test Environment
Tests were run for 30 minutes per user level against a cluster running Cloudera on cloud 7.3.2.0. The data set consisted of 50 databases with 100 tables and 1000 columns each. 1000 data shares were created, each sharing 5 tables with 200 external users.
| Users | Requests | Avg. Response Time (miliseconds) | Error |
|---|---|---|---|
| 50 | 9877 | 9087 | 0 |
| 100 | 9945 | 17484 | 0 |
| 150 | 10113 | 25222 | 728 (7.2%) |
Conclusion
Data Catalog stress tests showed reliable performance with up to 100 concurrent users. At 150 concurrent users, a significantly higher error rate was observed when listing external users and listing data shares. Response times exceeded 60 seconds, resulting in request timeouts. The errors seen at 150 users were:
504 Gateway Time-out
Status Code: 504; Error Code: UNKNOWN_ERROR; Service: datacatalog; Operation: listExternalUsers; Request ID: Unknown;
Recommendation
Cloudera recommends keeping the number of concurrent users accessing the Cloudera Data Catalog APIs at or below 100 for stable operation. Exceeding this threshold is an extreme scenario that may lead to degraded performance and timeout errors.
