Known Issues in Apache Atlas

Learn about the known issues in Apache Atlas, the impact or changes to the functionality, and the workaround.

CDPD-53176: Partition Specification data for Iceberg Table is not sent to Atlas in Hook context
When a Iceberg table is created with partition spec, partition specification data is not sent to Atlas in Hook context. The partition specification data is stored differently for Hive than for Spark and Impala.

For example, for Spark and Impala, the partition data is present in Table parameters.default-partition-spec but for Hive partition data is stored in Partition Transform Information and not Table parameters.default-partition-spec. In case of Hive, Atlas is not getting Partition Transform Information or Table parameters.default-partition-spec from Hook context.

CDPD-59413: Plugin is not supported with older Atlas server versions for Iceberg tables

Copy the model file 1130-iceberg_table_model.json to the directory: /opt/cloudera/parcels/CDH/lib/atlas/models/1000-Hadoop.

Proceed to restart the Atlas Service using Cloudera Manager.

CDPD-56590: Create table "like" from Iceberg table creates a hive_table instead of iceberg_table
By default, for tables created using the "like" command, lineage is not generated in Atlas. The destination like table should be of the same type as source table. Instead an iceberg_table for source and hive_table for destination are getting created.
CDPD-56085: [Impala Iceberg] LOAD DATA INPATH to Iceberg_table creates a temporary hive_table with name <iceberg_table_name>_tmp* and then marks it as DELETED in Atlas
Running a query like "LOAD DATA INPATH to iceberg_table", creates a temporary hive_table with name <iceberg_table_name>_tmp* and then marks it as DELETED in Atlas. So in Atlas, a deleted entity is created corresponding to the temporary table "<iceberg_table_name>_tmp*".
Tag added to the File system (HDFS) entity will not be propagated to the Iceberg table, user has to manually add to the iceberg_table, since the tag propagation is broken due to the deleted table in the flow.
CDPD-48122: Operations like admin/audits, admin/purge fail with a 500 internal server error message "[__AtlasAuditEntry.startTime] is not indexed in the targeted index [vertex_index]"
CDPD-67112: Import transforms do not work as expected when replacing a string which already has ":"
CDPD-67022: Export/Import: When a user with export/import permissions does not have the permission to create/read/write/update entity, the import operation fails with 403 error
The entity permission for __AtlasAuditEntry has to be present.
CDPD-67020: When a user has export/import permissions but no other permission on entity (read/write/update), export operation throws an error, import operation fails
The entity permission for AtlasServer and __ExportImportAuditEntry has to be present.
CDPD-65806: After upgrading from Cloudera Runtime 7.2.17 to 7.2.18, not all Iceberg table relationships are visible in the entity details page
OPSAPS-68461: Update GC and JVM options for Atlas service for supporting JDK17 in main Atlas CSD
Existing ATLAS OPTS does not work for JDK17. You must manually update ATLAS_OPTS.
DOCS-19084: Atlas Rolling Upgrade related to Zero Downtime Upgrade (ZDU)
Upgrade process comprises of upgrading Cloudera Manager + Runtime upgrade + Operating System upgrade. Though Atlas cannot comply with a full ZDU process, there is no data loss observed through the entire upgrade process. Post upgrade, all the created entities before and during the upgrade process are available without any changes or modifications.

Some limitations that are observed during the ZDU process:

  • While Atlas goes through the process of rolling upgrade, some downtime might be expected because Atlas does not support Active-Active model. Failover consumes sometime since Active-Passive is the currently supported model. As the Passive instance becomes Active, there is some downtime where Atlas is not reachable and the messages from clients are queued up in Kafka.
  • Solr does not support Rolling Upgrade due to which Atlas REST requests fail during the Solr upgrade.
  • Nodes unavailable due to OS Upgrade: Due to nodes going down and services not being accessible. (Not limited to Atlas but to also other available services).
  • During the Zookeeper service upgrade process, any API requests routed to the Passive node shall result in 503 error, until the Active instance is up.
  • During the Rolling upgrade, when the Hbase backup is being carried out, Atlas service is stopped and there can be a minimal disruption till the HBase backup process is completed. Atlas service gets restarted once the HBase backup process is completed.
CDPD-62973: Change in audits behavior in Cloudera Runtime 7.2.18 deployment.
When the value of differential audits is set as true, the audit information is not segregated based on the user which is firing the query. The HMS service user information includes details of the service user. When differential audit is enabled, only the difference between the two subsequent audits is logged, but in this case, there is no change in the data which is retrieved from HS2 and HMS, which does not create the audit. The user information is audited fine when differential audit is disabled
DOCS-19610: After upgrading from Cloudera Runtime 7.2.17 to 7.2.18, not all Iceberg table relationships are visible in the entity details page
The following entities are affected:
  • relationshipAttributes
  • hive partition_spec
  • database details missing
  • ddl_queries
CDPD-63397: During Data Lake upgrade, Atlas authorization is denied

When rolling upgrade is performed, there might be a scenario where Ranger Admin could be undergoing upgrade by itself and hence the policy download could be affected.

During this period, access might be denied for certain Atlas entities. This issue is resolved once Ranger Admin is up and the policies are downloaded.

CDPD-55301: The ddlQueries and ALTERTABLE_* lineage are missing for Spark tables created using spark3-shell
The ddlQueries and outputFromProcesses (lineage) is missing for the alter queries.
CDPD-54990: The in-place migration of Hive table to Iceberg table with ALTER TABLE storage_handler using Beeline creates new iceberg_table entity but retains the old hive_table entity as is
Running the query results with Atlas having two entities with same name but different types. One with hive_table and another with iceberg_table.
CDPD-40346: The ddlQueries and ALTERTABLE_ADDCOLS lineage missing for Impala tables
The ALTERTABLE_ADDCOLS lineage has some issue when an Impala table is altered and the corresponding lineage is not created.
CDPD-55671: When one Atlas server host is not reachable (stopped), the GET request does multiple failover for approximately 4 minutes and takes around 2 minutes for every failover and finally the request fails.
CDPD-55122: Any user with ssh access can view the downloaded results
CDPD-57549: Rolling upgrade / ZDU: Atlas throws 503 when Zookeeper goes through upgrade
When Zookeeper goes through Rolling upgrade, Atlas REST calls throws 503 error. Entities created using Atlas Kafka hook are created in Atlas and no data loss is expected.
CDPD-46606: Performing Hive queries renders a notification for update data in the Hive table
CDPD-24089: Atlas creates HDFS path entities for GCP path and the qualified name of those entities does not have a cluster name appended.
CDPD-45642: When REST Notification server is down, messages from hooks are lost
CDPD-46940: REST notification need to be disabled when running import scripts
CDPD-22082: ADLS Gen2 metadata extraction: If the queue is not cleared before performing Incremental extraction, messages are lost.
After successfully running Bulk extraction, you must clear the queue before running Incremental extraction.
CDPD-19996: Atlas AWS S3 metadata extractor fails when High Availability is configured for IDBroker
If you have HA configured for IDBroker, make sure your cluster has only one IDBroker address in core-site.xml. If your cluster has two IDBroker addresses in core-site.xml, remove one of them, and the extractor must be able to retrieve the token from IDBroker.
CDPD-19798: Atlas /v2/search/basic API does not retrieve results when the search text mentioned in the entity filter criteria (like searching by Database or table name) has special characters like + - & | ! ( ) { } [ ] ^ " ~ * ? :
You can invoke the API and mention the search string (with special characters) in the query attribute in the search parameters.
ATLAS-3921: Currently there is no migration path from AWS S3 version 1 to AWS S3 version 2
CDPD-12668: Navigator Spark lineage can fail to render in Atlas
As part of content conversion from Navigator to Atlas, the conversion of some spark applications created a cyclic lineage reference in Atlas, which the Atlas UI fails to render. The cases occur when a Spark application uses data from a table and updates the same table.
CDPD-11941: Table creation events missed when multiple tables are created in the same Hive command
When multiple Hive tables are created in the same database in a single command, the Atlas audit log for the database may not capture all the table creation events. When there is a delay between creation commands, audits are created as expected.
CDPD-11940: Database audit record misses table delete
When a hive_table entity is created, the Atlas audit list for the parent database includes an update audit. However, at this time, the database does not show an audit when the table is deleted.
CDPD-11790: Simultaneous events on the Kafka topic queue can produce duplicate Atlas entities
In normal operation, Atlas receives metadata to create entities from multiple services on the same or separate Kafka topics. In some instances, such as for Spark jobs, metadata to create a table entity in Atlas is triggered from two separate messages: one for the Spark operation and a second for the table metadata from HMS. If the process metadata arrives before the table metadata, Atlas creates a temporary entity for any tables that are not already in Atlas and reconciles the temporary entity with the HMS metadata when the table metadata arrives.
However, in some cases such as when Spark SQL queries with the write.saveAsTable function, Atlas does not reconcile the temporary and final table metadata, resulting in two entities with the same qualified name and no lineage linking the table to the process entity.
This issue is not seen for other lineage queries from spark:
create table default.xx3 as select * from default.xx2
insert into yy2 select * from yy
insert overwrite table ww2 select * from ww1
Another case where this behavior may occur is when many REST API requests are sent at the same time.
CDPD-11692: Navigator table creation time not converted to Atlas
In converting content from Navigator to Atlas, the create time for Hive tables is not moved to Atlas.
CDPD-11338: Cluster names with upper case letters may appear in lower case in some process names
Atlas records the cluster name as lower case in qualifiedNames for some process names. The result is that the cluster name may appear in lower case for some processes (insert overwrite table) while it appears in upper case for other queries (ctas) performed on the same cluster.
CDPD-10576: Deleted Business Metadata attributes appear in Search Suggestions
Atlas search suggestions continue to show Business Metadata attributes even if the attributes have been deleted.
CDPD-10574: Suggestion order doesn't match search weights
At this time, the order of search suggestions does not honor the search weight for attributes.
CDPD-9095: Duplicate audits for renaming Hive tables
Renaming a Hive table results in duplicate ENTITY_UPDATE events in the corresponding Atlas entity audits, both for the table and for its columns.
CDPD-7982: HBase bridge stops at HBase table with deleted column family
Bridge importing metadata from HBase fails when it encounters an HBase table for which a column family was previously dropped. The error indicates:
Metadata service API org.apache.atlas.AtlasClientV2$API_V2@58112bc4 failed with status 404 (Not Found) Response Body 
({""errorCode"":""ATLAS-404-00-007"",""errorMessage"":""Invalid instance creation/updation parameters passed : 
hbase_column_family.table: mandatory attribute value missing in type hbase_column_family""}) 
CDPD-7781: TLS certificates not validated on Firefox
Atlas is not checking for valid TLS certificates when the UI is opened in FireFox browsers.
CDPD-6675: Irregular qualifiedName format for Azure storage
The qualifiedName for hdfs_path entities created from Azure blog locations (ABFS) doesn't have the clusterName appended to it as do hdfs_path entities in other location types.
CDPD-5933 and CDPD-5931: Unexpected Search Results When Using Regular Expressions in Basic Searches on Classifications
When you include a regular expression or wildcard in the search criteria for a classification in the Basic Search, the results may differ unexpectedly from when full classification names are included. For example, the Exclude sub-classifications option is respected when using a full classification name as the search criteria; when using part of the classification name and the wildcard (*) with Exclude sub-classifications turned off, entities marked with sub-classifications are not included in the results. Other instances of unexpected results include case-sensitivity.
CDPD-4762: Spark metadata order may affect lineage
Atlas may record unexpected lineage relationships when metadata collection from the Spark Atlas Connector occurs out of sequence from metadata collection from HMS. For example, if an ALTER TABLE operation in Spark changing a table name and is reported to Atlas before HMS has processed the change, Atlas may not show the correct lineage relationships to the altered table.
CDPD-4545: Searches for Qualified Names with "@" doesn't fetch the correct results
When searching Atlas qualifiedName values that include an "at" character (@), Atlas does not return the expected results or generate appropriate search suggestions.
Consider leaving out the portion of the search string that includes the @ sign, using the wildcard character * instead.
CDPD-3208: Table alias values are not found in search
When table names are changed, Atlas keeps the old name of the table in a list of aliases. These values are not included in the search index in this release, so after a table name is changed, searching on the old table name will not return the entity for the table.
CDPD-3160: Hive lineage missing for INSERT OVERWRITE queries
Lineage is not generated for Hive INSERT OVERWRITE queries on partitioned tables. Lineage is generated as expected for CTAS queries from partitioned tables.
CDPD-3125: Logging out of Atlas does not manage the external authentication
At this time, Atlas does not communicate a log-out event with the external authentication management, Apache Knox. When you log out of Atlas, you can still open the instance of Atlas from the same web browser without re-authentication.
To prevent access to Atlas after logging out, close all browser windows and exit the browser.
CDPD-1892: Ranking of top results in free-text search not intuitive
The Free-text search feature ranks results based on which attributes match the search criteria. The attribute ranking is evolving and therefore the choice of top results may not be intuitive in this release.
If you don't find what you need in the top 5 results, use the full results or refine the search.
CDPD-1884: Free text search in Atlas is case sensitive
The free text search bar in the top of the screen allows you to search across entity types and through all text attributes for all entities. The search shows the top 5 results that match the search terms at any place in the text (*term* logic). It also shows suggestions that match the search terms that begin with the term (term* logic). However, in this release, the search results are case-sensitive.
If you don't see the results you expect, repeat the search changing the case of the search terms.
CDPD-1823: Queries with ? wildcard return unexpected results
DSL queries in Advanced Search return incorrect results when the query text includes a question mark (?) wildcard character. This problem occurs in environments where trusted proxy for Knox is enabled, which is always the case for CDP.
CDPD-1664: Guest users are redirected incorrectly
Authenticated users logging in to Atlas are redirected to the CDP Knox-based login page. However, if a guest user (without Atlas privileges) attempts to log in to Atlas, the user is redirected instead to the Atlas login page.
To avoid this problem, open the Atlas Dashboard in a private or incognito browser window.
CDPD-922: IsUnique relationship attribute not honored
The Atlas model includes the ability to ensure that an attribute can be set to a specific value in only one relationship entity across the cluster metadata. For example, if you wanted to add metadata tags to relationships that you wanted to make sure were unique in the system, you could design the relationship attribute with the property "IsUnique" equal true. However, in this release, the IsUnique attribute is not enforced.
CDPD-24058: The Atlas-Kafka hook creates a new entity instead of linking them

When the tool is used and later the plugin is enabled in Kafka configurations, new incomplete topic entities is created. The tool is not linking the existing topics with the clients.

CDPD-29663: Error while connecting topic with schema in Atlas

The error occurred when Schema Registry tried to make a relationship in Atlas between a schema and a non-existent corresponding topic.