[ Go Back ]
Monospaced | Used for commands, HTTP request and responses and code blocks. |
<Monospaced> | User entered values. |
[Monospaced] | Optional values. When the value is not specified, the default value is used. |
Italics | Important phrases and words. |
The HTTP REST API supports the complete FileSystem/FileContext interface for HDFS. The operations and the corresponding FileSystem/FileContext methods are shown in the next section. The Section HTTP Query Parameter Dictionary specifies the parameter details such as the defaults and the valid values.
The FileSystem scheme of WebHDFS is "webhdfs://". A WebHDFS FileSystem URI has the following format.
webhdfs://<HOST>:<HTTP_PORT>/<PATH>
The above WebHDFS URI corresponds to the below HDFS URI.
hdfs://<HOST>:<RPC_PORT>/<PATH>
In the REST API, the prefix "/webhdfs/v1" is inserted in the path and a query is appended at the end. Therefore, the corresponding HTTP URL has the following format.
http://<HOST>:<HTTP_PORT>/webhdfs/v1/<PATH>?op=...
Below are the HDFS configuration options for WebHDFS.
Property Name | Description |
---|---|
dfs.webhdfs.enabled | Enable/disable WebHDFS in Namenodes and Datanodes |
dfs.web.authentication.kerberos.principal | The HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint. The HTTP Kerberos principal MUST start with 'HTTP/' per Kerberos HTTP SPNEGO specification. |
dfs.web.authentication.kerberos.keytab | The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint. |
When security is off, the authenticated user is the username specified in the user.name query parameter. If the user.name parameter is not set, the server may either set the authenticated user to a default web user, if there is any, or return an error response.
When security is on, authentication is performed by either Hadoop delegation token or Kerberos SPNEGO. If a token is set in the delegation query parameter, the authenticated user is the user encoded in the token. If the delegation parameter is not set, the user is authenticated by Kerberos SPNEGO.
Below are examples using the curl command tool.
curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?[user.name=<USER>&]op=..."
curl -i --negotiate -u : "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=..."
curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?delegation=<TOKEN>&op=..."
When the proxy user feature is enabled, a proxy user P may submit a request on behalf of another user U. The username of U must be specified in the doas query parameter unless a delegation token is presented in authentication. In such case, the information of both users P and U must be encoded in the delegation token.
curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?[user.name=<USER>&]doas=<USER>&op=..."
curl -i --negotiate -u : "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?doas=<USER>&op=..."
curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?delegation=<TOKEN>&op=..."
curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=CREATE [&overwrite=<true|false>][&blocksize=<LONG>][&replication=<SHORT>] [&permission=<OCTAL>][&buffersize=<INT>]"
The request is redirected to a datanode where the file data is to be written:
HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=CREATE... Content-Length: 0
curl -i -X PUT -T <LOCAL_FILE> "http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=CREATE..."
The client receives a 201 Created response with zero content length and the WebHDFS URI of the file in the Location header:
HTTP/1.1 201 Created Location: webhdfs://<HOST>:<PORT>/<PATH> Content-Length: 0
Note that the reason of having two-step create/append is for preventing clients to send out data before the redirect. This issue is addressed by the "Expect: 100-continue" header in HTTP/1.1; see RFC 2616, Section 8.2.3. Unfortunately, there are software library bugs (e.g. Jetty 6 HTTP server and Java 6 HTTP client), which do not correctly implement "Expect: 100-continue". The two-step create/append is a temporary workaround for the software library bugs.
See also: overwrite, blocksize, replication, permission, buffersize, FileSystem.create
curl -i -X POST "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=APPEND[&buffersize=<INT>]"
The request is redirected to a datanode where the file data is to be appended:
HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=APPEND... Content-Length: 0
curl -i -X POST -T <LOCAL_FILE> "http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=APPEND..."
The client receives a response with zero content length:
HTTP/1.1 200 OK Content-Length: 0
See the note in the previous section for the description of why this operation requires two steps.
See also: buffersize, FileSystem.append
curl -i -X POST "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=CONCAT&sources=<PATHS>"
The client receives a response with zero content length:
HTTP/1.1 200 OK Content-Length: 0
See also: sources, FileSystem.concat
curl -i -L "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=OPEN [&offset=<LONG>][&length=<LONG>][&buffersize=<INT>]"
The request is redirected to a datanode where the file data can be read:
HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=OPEN... Content-Length: 0
The client follows the redirect to the datanode and receives the file data:
HTTP/1.1 200 OK Content-Type: application/octet-stream Content-Length: 22 Hello, webhdfs user!
See also: offset, length, buffersize, FileSystem.open
curl -i -X PUT "http://<HOST>:<PORT>/<PATH>?op=MKDIRS[&permission=<OCTAL>]"
The client receives a response with a boolean JSON object:
HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked {"boolean": true}
See also: permission, FileSystem.mkdirs
curl -i -X PUT "http://<HOST>:<PORT>/<PATH>?op=CREATESYMLINK &destination=<PATH>[&createParent=<true|false>]"
The client receives a response with zero content length:
HTTP/1.1 200 OK Content-Length: 0
See also: destination, createParent, FileSystem.createSymlink
curl -i -X PUT "<HOST>:<PORT>/webhdfs/v1/<PATH>?op=RENAME&destination=<PATH>"
The client receives a response with a boolean JSON object:
HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked {"boolean": true}
See also: destination, FileSystem.rename
curl -i -X DELETE "http://<host>:<port>/webhdfs/v1/<path>?op=DELETE [&recursive=<true|false>]"
The client receives a response with a boolean JSON object:
HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked {"boolean": true}
See also: recursive, FileSystem.delete
curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=GETFILESTATUS"
The client receives a response with a FileStatus JSON object:
HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked { "FileStatus": { "accessTime" : 0, "blockSize" : 0, "group" : "supergroup", "length" : 0, //in bytes, zero for directories "modificationTime": 1320173277227, "owner" : "webuser", "pathSuffix" : "", "permission" : "777", "replication" : 0, "type" : "DIRECTORY" //enum {FILE, DIRECTORY, SYMLINK} } }
See also: FileSystem.getFileStatus
curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=LISTSTATUS"
The client receives a response with a FileStatuses JSON object:
HTTP/1.1 200 OK Content-Type: application/json Content-Length: 427 { "FileStatuses": { "FileStatus": [ { "accessTime" : 1320171722771, "blockSize" : 33554432, "group" : "supergroup", "length" : 24930, "modificationTime": 1320171722771, "owner" : "webuser", "pathSuffix" : "a.patch", "permission" : "644", "replication" : 1, "type" : "FILE" }, { "accessTime" : 0, "blockSize" : 0, "group" : "supergroup", "length" : 0, "modificationTime": 1320895981256, "owner" : "szetszwo", "pathSuffix" : "bar", "permission" : "711", "replication" : 0, "type" : "DIRECTORY" }, ... ] } }
See also: FileSystem.listStatus
curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=GETCONTENTSUMMARY"
The client receives a response with a ContentSummary JSON object:
HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked { "ContentSummary": { "directoryCount": 2, "fileCount" : 1, "length" : 24930, "quota" : -1, "spaceConsumed" : 24930, "spaceQuota" : -1 } }
See also: FileSystem.getContentSummary
curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=GETFILECHECKSUM"
The request is redirected to a datanode:
HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=GETFILECHECKSUM... Content-Length: 0
The client follows the redirect to the datanode and receives a FileChecksum JSON object:
HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked { "FileChecksum": { "algorithm": "MD5-of-1MD5-of-512CRC32", "bytes" : "eadb10de24aa315748930df6e185c0d ...", "length" : 28 } }
See also: FileSystem.getFileChecksum
curl -i "http://<HOST>:<PORT>/webhdfs/v1/?op=GETHOMEDIRECTORY"
The client receives a response with a Path JSON object:
HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked {"Path": "/user/szetszwo"}
See also: FileSystem.getHomeDirectory
curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=SETPERMISSION [&permission=<OCTAL>]"
The client receives a response with zero content length:
HTTP/1.1 200 OK Content-Length: 0
See also: permission, FileSystem.setPermission
curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=SETOWNER [&owner=<USER>][&group=<GROUP>]"
The client receives a response with zero content length:
HTTP/1.1 200 OK Content-Length: 0
See also: owner, group, FileSystem.setOwner
curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=SETREPLICATION [&replication=<SHORT>]"
The client receives a response with a boolean JSON object:
HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked {"boolean": true}
See also: replication, FileSystem.setReplication
curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=SETTIMES [&modificationtime=<TIME>][&accesstime=<TIME>]"
The client receives a response with zero content length:
HTTP/1.1 200 OK Content-Length: 0
See also: modificationtime, accesstime, FileSystem.setTimes
curl -i "http://<HOST>:<PORT>/webhdfs/v1/?op=GETDELEGATIONTOKEN&renewer=<USER>"
The client receives a response with a Token JSON object:
HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked { "Token": { "urlString": "JQAIaG9y..." } }
See also: renewer, FileSystem.getDelegationToken
curl -i "http://<HOST>:<PORT>/webhdfs/v1/?op=GETDELEGATIONTOKENS&renewer=<USER>"
The client receives a response with a Tokens JSON object:
HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked { "Tokens": { "Token": [ { "urlString":"KAAKSm9i ..." } ] } }
See also: renewer, FileSystem.getDelegationTokens
curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/?op=RENEWDELEGATIONTOKEN&token=<TOKEN>"
The client receives a response with a long JSON object:
HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked {"long": 1320962673997} //the new expiration time
See also: token, FileSystem.renewDelegationToken
curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/?op=CANCELDELEGATIONTOKEN&token=<TOKEN>"
The client receives a response with zero content length:
HTTP/1.1 200 OK Content-Length: 0
See also: token, FileSystem.cancelDelegationToken
When an operation fails, the server may throw an exception. The JSON schema of error responses is defined in RemoteException JSON schema. The table below shows the mapping from exceptions to HTTP response codes.
Exceptions | HTTP Response Codes |
---|---|
IllegalArgumentException | 400 Bad Request |
UnsupportedOperationException | 400 Bad Request |
SecurityException | 401 Unauthorized |
IOException | 403 Forbidden |
FileNotFoundException | 404 Not Found |
RumtimeException | 500 Internal Server Error |
Below are examples of exception responses.
HTTP/1.1 400 Bad Request Content-Type: application/json Transfer-Encoding: chunked { "RemoteException": { "exception" : "IllegalArgumentException", "javaClassName": "java.lang.IllegalArgumentException", "message" : "Invalid value for webhdfs parameter \"permission\": ..." } }
HTTP/1.1 401 Unauthorized Content-Type: application/json Transfer-Encoding: chunked { "RemoteException": { "exception" : "SecurityException", "javaClassName": "java.lang.SecurityException", "message" : "Failed to obtain user group information: ..." } }
HTTP/1.1 403 Forbidden Content-Type: application/json Transfer-Encoding: chunked { "RemoteException": { "exception" : "AccessControlException", "javaClassName": "org.apache.hadoop.security.AccessControlException", "message" : "Permission denied: ..." } }
HTTP/1.1 404 Not Found Content-Type: application/json Transfer-Encoding: chunked { "RemoteException": { "exception" : "FileNotFoundException", "javaClassName": "java.io.FileNotFoundException", "message" : "File does not exist: /foo/a.patch" } }
All operations, except for OPEN, either return a zero-length response or a JSON response. For OPEN, the response is an octet-stream. The JSON schemas are shown below. See draft-zyp-json-schema-03 for the syntax definitions of the JSON schemas.
{ "name" : "boolean", "properties": { "boolean": { "description": "A boolean value", "type" : "boolean", "required" : true } } }
See also: MKDIRS, RENAME, DELETE, SETREPLICATION
{ "name" : "ContentSummary", "properties": { "ContentSummary": { "type" : "object", "properties": { "directoryCount": { "description": "The number of directories.", "type" : "integer", "required" : true }, "fileCount": { "description": "The number of files.", "type" : "integer", "required" : true }, "length": { "description": "The number of bytes used by the content.", "type" : "integer", "required" : true }, "quota": { "description": "The namespace quota of this directory.", "type" : "integer", "required" : true }, "spaceConsumed": { "description": "The disk space consumed by the content.", "type" : "integer", "required" : true }, "spaceQuota": { "description": "The disk space quota.", "type" : "integer", "required" : true } } } } }
See also: GETCONTENTSUMMARY
{ "name" : "FileChecksum", "properties": { "FileChecksum": { "type" : "object", "properties": { "algorithm": { "description": "The name of the checksum algorithm.", "type" : "string", "required" : true }, "bytes": { "description": "The byte sequence of the checksum in hexadecimal.", "type" : "string", "required" : true }, "length": { "description": "The length of the bytes (not the length of the string).", "type" : "integer", "required" : true } } } } }
See also: GETFILECHECKSUM
{ "name" : "FileStatus", "properties": { "FileStatus": fileStatusProperties //See FileStatus Properties } }
See also: FileStatus Properties, GETFILESTATUS, FileStatus
JavaScript syntax is used to define fileStatusProperties so that it can be referred in both FileStatus and FileStatuses JSON schemas.
var fileStatusProperties = { "type" : "object", "properties": { "accessTime": { "description": "The access time.", "type" : "integer", "required" : true }, "blockSize": { "description": "The block size of a file.", "type" : "integer", "required" : true }, "group": { "description": "The group owner.", "type" : "string", "required" : true }, "length": { "description": "The number of bytes in a file.", "type" : "integer", "required" : true }, "modificationTime": { "description": "The modification time.", "type" : "integer", "required" : true }, "owner": { "description": "The user who is the owner.", "type" : "string", "required" : true }, "pathSuffix": { "description": "The path suffix.", "type" : "string", "required" : true }, "permission": { "description": "The permission represented as a octal string.", "type" : "string", "required" : true }, "replication": { "description": "The number of replication of a file.", "type" : "integer", "required" : true }, "symlink": //an optional property { "description": "The link target of a symlink.", "type" : "string" }, "type": { "description": "The type of the path object.", "enum" : ["FILE", "DIRECTORY", "SYMLINK"], "required" : true } } };
A FileStatuses JSON object represents an array of FileStatus JSON objects.
{ "name" : "FileStatuses", "properties": { "FileStatuses": { "type" : "object", "properties": { "FileStatus": { "description": "An array of FileStatus", "type" : "array", "items" : fileStatusProperties //See FileStatus Properties } } } } }
See also: FileStatus Properties, LISTSTATUS, FileStatus
{ "name" : "long", "properties": { "long": { "description": "A long integer value", "type" : "integer", "required" : true } } }
See also: RENEWDELEGATIONTOKEN,
{ "name" : "Path", "properties": { "Path": { "description": "The string representation a Path.", "type" : "string", "required" : true } } }
See also: GETHOMEDIRECTORY, Path
{ "name" : "RemoteException", "properties": { "RemoteException": { "type" : "object", "properties": { "exception": { "description": "Name of the exception", "type" : "string", "required" : true }, "message": { "description": "Exception message", "type" : "string", "required" : true }, "javaClassName": //an optional property { "description": "Java class name of the exception", "type" : "string", } } } } }
See also: Error Responses
{ "name" : "Token", "properties": { "Token": tokenProperties //See Token Properties } }
See also: Token Properties, GETDELEGATIONTOKEN, the note in Delegation.
JavaScript syntax is used to define tokenProperties so that it can be referred in both Token and Tokens JSON schemas.
var tokenProperties = { "type" : "object", "properties": { "urlString": { "description": "A delegation token encoded as a URL safe string.", "type" : "string", "required" : true } } }
A Tokens JSON object represents an array of Token JSON objects.
{ "name" : "Tokens", "properties": { "Tokens": { "type" : "object", "properties": { "Token": { "description": "An array of Token", "type" : "array", "items" : "Token": tokenProperties //See Token Properties } } } } }
See also: Token Properties, GETDELEGATIONTOKENS, the note in Delegation.
Name | accesstime |
---|---|
Description | The access time of a file/directory. |
Type | long |
Default Value | -1 (means keeping it unchanged) |
Valid Values | -1 or a timestamp |
Syntax | Any integer. |
See also: SETTIMES
Name | blocksize |
---|---|
Description | The block size of a file. |
Type | long |
Default Value | Specified in the configuration. |
Valid Values | > 0 |
Syntax | Any integer. |
See also: CREATE
Name | buffersize |
---|---|
Description | The size of the buffer used in transferring data. |
Type | int |
Default Value | Specified in the configuration. |
Valid Values | > 0 |
Syntax | Any integer. |
Name | createparent |
---|---|
Description | If the parent directories do not exist, should they be created? |
Type | boolean |
Default Value | false |
Valid Values | true |
Syntax | true |
See also: CREATESYMLINK
Name | delegation |
---|---|
Description | The delegation token used for authentication. |
Type | String |
Default Value | <empty> |
Valid Values | An encoded token. |
Syntax | See the note below. |
Note that delegation tokens are encoded as a URL safe string; see encodeToUrlString() and decodeFromUrlString(String) in org.apache.hadoop.security.token.Token for the details of the encoding.
See also: Authentication
Name | destination |
---|---|
Description | The destination path. |
Type | Path |
Default Value | <empty> (an invalid path) |
Valid Values | An absolute FileSystem path without scheme and authority. |
Syntax | Any path. |
See also: CREATESYMLINK, RENAME
Name | doas |
---|---|
Description | Allowing a proxy user to do as another user. |
Type | String |
Default Value | null |
Valid Values | Any valid username. |
Syntax | Any string. |
See also: Proxy Users
Name | group |
---|---|
Description | The name of a group. |
Type | String |
Default Value | <empty> (means keeping it unchanged) |
Valid Values | Any valid group name. |
Syntax | Any string. |
See also: SETOWNER
Name | length |
---|---|
Description | The number of bytes to be processed. |
Type | long |
Default Value | null (means the entire file) |
Valid Values | >= 0 or null |
Syntax | Any integer. |
See also: OPEN
Name | modificationtime |
---|---|
Description | The modification time of a file/directory. |
Type | long |
Default Value | -1 (means keeping it unchanged) |
Valid Values | -1 or a timestamp |
Syntax | Any integer. |
See also: SETTIMES
Name | offset |
---|---|
Description | The starting byte position. |
Type | long |
Default Value | 0 |
Valid Values | >= 0 |
Syntax | Any integer. |
See also: OPEN
Name | op |
---|---|
Description | The name of the operation to be executed. |
Type | enum |
Default Value | null (an invalid value) |
Valid Values | Any valid operation name. |
Syntax | Any string. |
See also: Operations
Name | overwrite |
---|---|
Description | If a file already exists, should it be overwritten? |
Type | boolean |
Default Value | false |
Valid Values | true |
Syntax | true |
See also: CREATE
Name | owner |
---|---|
Description | The username who is the owner of a file/directory. |
Type | String |
Default Value | <empty> (means keeping it unchanged) |
Valid Values | Any valid username. |
Syntax | Any string. |
See also: SETOWNER
Name | permission |
---|---|
Description | The permission of a file/directory. |
Type | Octal |
Default Value | 755 |
Valid Values | 0 - 1777 |
Syntax | Any radix-8 integer (leading zeros may be omitted.) |
See also: CREATE, MKDIRS, SETPERMISSION
Name | recursive |
---|---|
Description | Should the operation act on the content in the subdirectories? |
Type | boolean |
Default Value | false |
Valid Values | true |
Syntax | true |
See also: RENAME
Name | renewer |
---|---|
Description | The username of the renewer of a delegation token. |
Type | String |
Default Value | <empty> (means the current user) |
Valid Values | Any valid username. |
Syntax | Any string. |
See also: GETDELEGATIONTOKEN, GETDELEGATIONTOKENS
Name | replication |
---|---|
Description | The number of replications of a file. |
Type | short |
Default Value | Specified in the configuration. |
Valid Values | > 0 |
Syntax | Any integer. |
See also: CREATE, SETREPLICATION
Name | sources |
---|---|
Description | A list of source paths. |
Type | String |
Default Value | <empty> |
Valid Values | A list of comma seperated absolute FileSystem paths without scheme and authority. |
Syntax | Any string. |
See also: CONCAT
Name | token |
---|---|
Description | The delegation token used for the operation. |
Type | String |
Default Value | <empty> |
Valid Values | An encoded token. |
Syntax | See the note in Delegation. |
See also: RENEWDELEGATIONTOKEN, CANCELDELEGATIONTOKEN
Name | user.name |
---|---|
Description | The authenticated user; see Authentication. |
Type | String |
Default Value | null |
Valid Values | Any valid username. |
Syntax | Any string. |
See also: Authentication