File system for content repository
If your function is going to process very large files, the data may not be held in memory, which would cause errors. In that case you may want to attach a File System to your Lambda and tell DataFlow Functions to use the File System as the Content Repository.
If you want to attach a File System to your Lambda, you are required to connect your Lambda to the VPC where the File System is provisioned.
However, by default, Lambda runs your functions in a secure VPC with access to AWS services and the internet. Lambda owns this VPC, which is not connected to your account's default VPC. When you connect a function to a VPC in your account, the function cannot access the internet unless your VPC provides access. Access to the internet is required so that DataFlow Functions can request the DataFlow Catalog and retrieve the flow definition to be executed.
In such a situation, a recommended option is to create a VPC with a public subnet and multiple private subnets (to increase resiliency across multiple availability zones). You can then create an Internet Gateway and NAT Gateway to give your Lambda access to the internet. For more information, see the AWS Knowledge Center.
Depending on your Lambda’s configuration and the throughput of the events triggering the function, you may have multiple instances of the Lambda running concurrently and each instance would share the same File System. For more information, see this AWS Compute Blog article.
To ensure each instance of the function has its own “content repository”, you can
CONTENT_REPO environment variable when configuring your Lambda and
give it the value of the Local mount path of the attached File
In this case, each instance of the function will create its content repository on
the File System under the following path:
You can share the same File System across multiple Lambda functions if desired depending on the IO performances you need.