Fixed Issues in Atlas
Review the list of Apache Atlas issues that are resolved in Cloudera Runtime 7.2.2.
- CDPD-1138: Spark Atlas Connector tracks column-level lineage
- This issue is now resolved.
- CDPD-11790: Shell entity is not resolved to the Complete entity under certain conditions.
- Shell entities with duplicate qualifiedName are no longer created when processing message from Spark Atlas Connector. This issue is now resolved.
- CDPD-14031: In the Spark Atlas Connector, few S3 entities are created using the V1 S3 model instead of the updated V2 S3 model.
- Use Atlas S3 v2 models in Spark Atlas Connector. This issue is now resolved.
- OPSAPS-57947: Kafka Broker SSL configuration is not correct in High Availability mode.
- When deploying the DataHub in High Availability mode, some of the Ranger and Atlas configurations are not computed correctly. In particular atlas.kafka.security.protocol in Atlas, the SSL properties and the REST URL services depending on Ranger.
- CDPD-13645: Contains sortBy=name, 'name' attribute is not in hive_storagedesc definition. Also if sortBy is not passed, default attribute is name
- Validated if sortBy attribute passed in the request is present in relationship end definition, if not present, ignore sorting.
- CDPD-10873
- 1) Fixed quick search aggregation metrics when filtered with System Attributes
- CDPD-13805: Relationship api request will have provision to specify attributes to be present in search result.
- Example Request: /v2/search/relationship?guid=ac9e04cc-f927-4334-af08-c83bc3733f5b&relation=columns&sortBy=name&sortOrder=ASCENDING&attributes=dcProfiledData
- CDPD-11681:
- 1. Filter Search Results with multiple entity type by 'comma' separated string of typeName in the request Eg. "typeName": "hive_table,hive_db".
- CDPD-13199: Incorrect attribute values in bulk import
- When importing Business Metadata attribute assignments, Atlas used only the last assigned attribute value instead of individual values for each entity in the import list.
- CDPD-372: All Spark Queries from the Same Spark Session were included in a Single Atlas Process
- A Spark session can include multiple queries. When Atlas reports
the Spark metadata, it creates a single process entity to correspond
to the Spark session. The result was that an Atlas lineage picture
showed multiple input entities or multiple output entities for a
process, but the inputs and outputs were only related by the fact that
they were included in operations in the same Spark session. In this
release, the Spark Atlas Connector produces a
spark_application
entity for each Spark job. Each data flow produced by the job creates aspark_process
entity in Atlas, which tracks the actual input and output data sets for that process. For more information, see Spark metadata collection. - CDPD-12620: Migration progress bar not refreshed
- During the import stage of Cloudera Navigator to Apache Atlas migration, the migration progress bar does not correctly refresh the migration status. The Statistics page in the Atlas UI displays the correct details of the migration.
- CDPD-10151: Short Spark job processes may be lost
- In rare occasions, it is possible for events captured for Atlas by the Spark Atlas Connector to be dropped before the metadata reaches Atlas. It is more likely that an event is lost in very short running jobs.
- CDPD-6042: Hive Default Database Location Incorrect in Atlas Metadata
- The location of the default Hive database as reported through the HMS-Atlas plugin does not match the actual location of the database. This problem does not affect non-default databases.
- CDPD-4662: Lineage graph links not working
- Atlas lineage graphs do not include hyperlinks from assets to the assets' detail pages and clicking an asset does not provide an error in the log. Clicking an edge in a graph still provides access to edge behavior options such as controlling how classifications propagate.
- CDPD-3700: Missing Impala and Spark lineage between tables and their data files
- Atlas does not create lineage between Hive tables and their backing HDFS files for CTAS processes run in Impala or Spark.