- Published on
Best fit for long-running task
- Authors
- Name
- Pursue
Catalogue
- 1 Foreword
- 2 Start with Lambda
- 3 Jump to Recursive Lambda
- 3 Adopt Step Function
- 4 Re-engineer towards ECS Fargate(task standalone)
- 5 Go with AWS Batch
- 6 Summary
- 7 Cost Comparison
- 7 Reference
1 Foreword
When it comes to long-running tasks in AWS, the most-chosen solutions are AWS Lambda
and AWS EC2
, the former is because of its cheapest cost and ease of use but limited to maximum 15 mins execution time, and the latter has much more flexibility and customization while requires operational effort to maintain the infrastructure.
Generally EC2 might be the first choice when Lambda becomes a bottleneck. In this article, I'm gonna explore further what are the potential workarounds before we jump to EC2.
2 Start with Lambda
Imagine we have a task that needs to run for a few mins on a daily basis, it's hard to bypass Lambda option because the following benefits it can provide:
- Widely supported by many kinds of mainstream programming languages
- Easy to scale and engineer
- You only focus on your code logic as it is serverless
- Pay as you go, no need to pay for idle time with a very low cost
- Support both Zip and Container image deployment
With these benefits, Lambda is a perfect fit for most of the scenarios at the beginning.
However, with the increase of your data volume, your daily task is getting more and more complex, leading to more time to complete until one day it exceeds the 15 mins execution time limit.
How could we solve this problem? Especially the total time consumption is just a little bit excessive(e.g. 20 or 30 mins) and our code was running perfectly on Lambda, are we wiling to abandon all what we've built?
3 Jump to Recursive Lambda
Fortunately, Lambda supports recursive invocation, which means your Lamabda can call the itself again and again until the task is eventually done.
For achieving this, the following steps are required:
- Break down the task into smaller pieces
- Each execution of Lambda should only handle one piece of the task
- The running result of each execution can be passed into next one as the input
- The logic of making the task finally done should be precise to ensure there's no infinite loop
- Your Lambda IAM role needs the ability to invoke the Lambda function itself
Recursive Lambda can keep all your existing code and infrastructure the same but a little bit more logic to split your task, it's a good choice when you don't want to change too much.
However, there is one critical limitation of recursive Lambda you should know before you decide to use it: The maximum number of recursive invocations is 16.
With said being said, your maximum task time is limited to 16 * 15 mins = 240 mins
. However, given we are unsure when AWS will get it more restricted again (there was no invocation limit for Recursive Lambda before), thus there's no guarantee you can make the full use of the 240 mins, you'd better to make your Lambda recursive as less invocations as possible(less than 16 times).
Adopting recursive Lambda requires so much mental burden, then how should we iterate ....
3 Adopt Step Function
Recursive Lambda is a good direction in terms of extending the execution time by invoking the same Lambda function many times, if we can eliminate the invocation upper limit, it might be a perfect option for most of cases.
Step Function
is proposed with this purpose, you can use Step Function
to orchestrate the whole process as follows:
- Create your state machine to outline what steps are involved in your workflow
- Break down the task into smaller pieces
- The running result of each execution can be passed into the context of your workflow and the next step can read it as input
- One condition needs to be defined to determine when your task is finished, if YES, then your workflow is done; Otherwise, the next step is to invoke your Lambda function again until the ending condition is met
- Step Function has retry mechanism when there's an error(compare to Recusive Lambda, which is synchronised invokded, Lambda won't retry for this fashion)
- Your Step Function IAM role needs the ability to invoke the Lambda function
It looks pretty much the same as Recursive Lambda, but the key difference is: instead of invoking the Lambda function by itself with a limit, Step Function can invoke the Lambda function on the workflow behalf, therefore you can invoke "infinite" times as long as the task doesn't reach your ending condition.
The diagram below illustrates how Step Function works in this scenario:
Step Function
has been a great solution so far for a long-running tasks, but some of its cons are hidden behind the scenes and worth mentioning:
- It doesn't fundamentally solve the problem of Lambda's 15 mins execution time limit, it requires you to break down your tasks and manages your tasks within a workflow. As a result, you have to wait for a long time to complete your whole workflow if your task is very time-consuming, e.i. depending on how many 15-mins tasks you split out.
- Step Function mainly focuses on the orchestration of your tasks, it is powerful specifically for the workflow management across different AWS services, it might be an overkill if you only need to extend the Lambda execution time.
4 Re-engineer towards ECS Fargate(task standalone)
Now our original task is running exceedingly slow, we have to tweak the Lambda memory to speed up the execution time, but because of the 15 mins execution time limit, the number of child tasks are still high, resulting in a long time to complete the whole task.
For this case, we have to move our task to ECS
, to be more specific, ECS with Fargate
launch type and task standalone
mode is suitable for this scenario. You can achieve the same goal by following the steps below:
- Create ECS cluster with Fargate launch type as it is serverless and you don't need to manage the underlying infrastructure
- Create a
Fargate
task definition with the same container image as your task foundation - Use standalone task mode to run your task without the need of a service
- Schedule your task to run on a daily basis with the built-in feature
Schedule Task
in ECS
Note: Spot instance is recommended for cost-saving if your task can tolerate the interruption
By doing so, you still can schedule many task instances to run concurrently, but the execution time is no longer limited.
ECS option is very close to the EC2 direction expect it is serverless and requires minor operational effort, it is a good choice when you need to run a long-running task with a high frequency.
5 Go with AWS Batch
AWS Batch
is the last option I'd like to introduce, it can leverage ECS Fargate under the hood but the subtle difference is that AWS Batch is more suitable for batch processing tasks. Let's imagine the scenario again:
You have a large number of tasks to run concurrently(say 100 tasks), each of 10 tasks as a group need to complete or fail together, then groups of tasks requires to be scheduled to run one by one.
AWS Batch can achieve this by following the steps below:
- You need define your max vCPU and memory for all your tasks to utilize as a resource pool
- Create a job queue to manage the job submission and scheduling
- Create a job definition to define the job's requirements(similar to ECS task definition)
- Submit a job via ECS Fargate option to the queue with an 10-size array, the job will be pending if there is no enough resource to run
- Supposing the submitted job id is
job-1
, once it starts, you can check out in ECS cluster that there are 10 tasks running concurrently - While
job-1
is running, you can submit another jobjob-2
to the queue with the same way but mark thejob-1
as a dependency ofjob-2
, so thatjob-2
will only start whenjob-1
is done - Repeat the above steps until all your tasks are done
As you can see, AWS Batch is more about batch job management and scheduling, it is a good choice when you have a large number of tasks to run concurrently and need to manage the job dependencies, which is much more difficult for ECS Fargate to accomplish.
6 Summary
In this article, I've introduced 4 options to consider before jumping to EC2 when Lambda can't meet your requirements, each of them has its pros and cons, you can choose the best fit based on your specific scenario:
- Recursive Lambda
- Pros: Keep your existing code and infrastructure the same
- Cons: Limited to 16 recursive invocations
- Step Function
- Pros: No invocation limit
- Cons: Overkill for extending Lambda execution time
- ECS Fargate(task standalone)
- Pros: Serverless and no need to manage the underlying infrastructure
- Cons: Minor operational effort
- AWS Batch
- Pros: Batch job management and scheduling
- Cons: A bit more operational effort
7 Cost Comparison
Given the following task requirements:
Region | Memory | Duration | Arch | Task count |
---|---|---|---|---|
Sydney | 8 GB | 15 mins | ARM | 1 |
- AWS Lambda
Cost
: ~0.096 USD
Price Calculation:
0.0001067(8GB) * 900 = 0.096 USD
- AWS ECS Fargate
Cost
: Non-spot ~0.56 USD, spot less than 0.56
Price calculation(Non-spot):
30.42 tasks x 1 vCPU x 0.25 hours x 0.03885 USD per hour = 0.30 USD for vCPU hours
30.42 tasks x 8.00 GB x 0.25 hours x 0.00426 USD per GB per hour = 0.26 USD for GB hours
20 GB - 20 GB (no additional charge) = 0.00 GB billable ephemeral storage per task
0.30 USD for vCPU hours + 0.26 USD for GB hours = 0.56 USD total
Fargate cost (monthly): 0.56 USD
- AWS Batch
There is no additional charge for AWS Batch, you will pay for the resources it is using behind the scene, e.g. price is the same as ECS Fargate if job is submitted via ECS task.