FAQ's
S3 is a lightweight solution that is designed to let the analyst use SQL to perform a single object but Athena can be used for querying multiple objects at once. Athena can also be used for complex queries while a user can use S3 only for simple queries. AWS Console can use Athena whereas S3 is an API.
The advantages and disadvantages of using Athena are as follows:
- Users can run multiple & complex queries parallelly.
- Presto can be used to run SQL queries. It is open source and also optimised for data analysis.
- Athena is serverless and hence there is no need for infrastructure.
- Cost effective, as you only pay for the data that is scanned. It costs approx $5 per terabyte of data that is scanned.
Disadvantages :
- Data that is stored in S3 cannot be optimised. Optimization is limited to only queries.
- The column and row size cannot be increased by more than 32 megabytes
- If a source file starts with an underscore or a dot it will be treated as hidden.
- Athena can be timed out if a user queries a table with thousands of partitions.
The limitations of AWS Athena are:
- AWS has database limits, for instance, the Amazon S3 bucket limit is only 100 buckets per user/account by default. Though one can increase it to 1000 S3 per user.
- The Amazon Athena query string hard limit is only 262144 bytes.
- The Athena Data Manipulation Language query is also limited to 30 minutes.
Both Redshift spectrum and Amazon Athena are serverless but they differ in various aspects from one another.
- To return query results Athena relies on pooled resources which are provided by AWS whereas, in Spectrum Redshift cluster size allocates the resources.
- Redshift Spectrum gives users more control as compared to Athena.
- In case you want any specific query to return quickly you can allocate an additional computer resource. The same cannot be done using Athena.
Amazon Glue is an ETL service that allows users to manipulate data and also the management of data pipelines. Therefore it is more of a transformation and data movement tool. On other hand, Amazon Athena is majorly used as a query tool for the analysis of data.