
VPC flow logs capture information about the IP traffic going to and from network interfaces in a VPC. Athena is an interactive query service that makes it easy to analyze data directly in S3 using standard SQL.
Topology

Create a (S3) Bucket


Enable (VPC) Flow Logs


Apache Parquet is a columnar data format that stores and queries data more efficiently and cost-effectively than a text format. Queries on data stored in Parquet format are 10 to 100 times faster and cheaper than data stored in text format. Flow logs delivered in Parquet format with Gzip compression use about 20 percent less storage space in Amazon S3 than flow logs delivered in text format with Gzip compression, further reducing storage and query costs.

Athena Integration




Cloud Formation



Athena



Testing

Bandwidth Consumption

Breaking total bytes by vpc:

Query Examples
- daily bandwidth utilization
SELECT SUM(bytes) AS total_bytes
FROM <table_name>
WHERE DATE(date_parse("@timestamp", '%Y-%m-%dT%H:%i:%S.%fZ')) = date_parse('2022-02-15', '%Y-%m-%d');
- daily bandwidth utilization (range)
SELECT DATE(date_parse("@timestamp", '%Y-%m-%dT%H:%i:%S.%fZ')) AS date, SUM(bytes) AS total_bytes
FROM <table_name>
WHERE DATE(date_parse("@timestamp", '%Y-%m-%dT%H:%i:%S.%fZ')) BETWEEN date_parse('start_date', '%Y-%m-%d') AND date_parse('end_date', '%Y-%m-%d')
GROUP BY DATE(date_parse("@timestamp", '%Y-%m-%dT%H:%i:%S.%fZ'));
- max daily bandwidth utilization
SELECT DATE(date_parse("@timestamp", '%Y-%m-%dT%H:%i:%S.%fZ')) AS date, SUM(bytes) AS total_bytes
FROM <table_name>
WHERE date_parse("@timestamp", '%Y-%m-%dT%H:%i:%S.%fZ') BETWEEN date_parse('start_date', '%Y-%m-%d') AND date_parse('end_date', '%Y-%m-%d')
GROUP BY DATE(date_parse("@timestamp", '%Y-%m-%dT%H:%i:%S.%fZ'));
- top talkers
SELECT srcaddr, SUM(bytes) AS total_bytes
FROM <table_name>
WHERE date_parse("@timestamp", '%Y-%m-%dT%H:%i:%S.%fZ') BETWEEN date_parse('start_date', '%Y-%m-%d') AND date_parse('end_date', '%Y-%m-%d')
GROUP BY srcaddr
ORDER BY total_bytes DESC;
- top services/port
SELECT dstport, SUM(bytes) AS total_bytesFROM <table_name>WHERE date_parse("@timestamp", '%Y-%m-%dT%H:%i:%S.%fZ') BETWEEN date_parse('start_date', '%Y-%m-%d') AND date_parse('end_date', '%Y-%m-%d')GROUP BY dstportORDER BY total_bytes DESC;
- malicious/denied traffic
SELECT *
FROM <table_name>
WHERE action = 'DENY';
- trend
SELECT DATE_FORMAT(date_parse("@timestamp", '%Y-%m-%dT%H:%i:%S.%fZ'), '%Y-%m-%d %H:00:00') AS hour, SUM(bytes) AS total_bytes
FROM <table_name>
WHERE date_parse("@timestamp", '%Y-%m-%dT%H:%i:%S.%fZ') BETWEEN date_parse('start_date', '%Y-%m-%d') AND date_parse('end_date', '%Y-%m-%d')
GROUP BY hour;