Seeing my digital portfolio ecosystem take shape after several days of effort has been exhilarating. Following a barrage of millions of SSH brute force attacks, I’ve come to greatly appreciate the significance of having robust security measures. Before adopting SIEM products to oversee my entire ecosystem, I was keen to explore whether I could employ automation for monitoring malicious event types in my AWS account. Thus, I devised an automation strategy: leveraging a Jenkins pipeline to invoke AWS Athena query for processing AWS CloudTrail logs and then using Python Matplotlib for data visualization.
This pipeline didn’t succeed due to the troublesome issue of a duplicate key named “tagging” within the CloudTrail raw data, which could demand considerable effort to rectify the data structure. Consequently, I’m considering the adoption of a SIEM product as an alternative. Despite this setback, the entire endeavor has been a valuable learning experience. Below, I will outline the key steps involved.
Here’s a concise overview of the steps I took to establish this pipeline::
- Create an S3 bucket named s3://xxxxx-aws-cloudtrail-logs-xxxxx for CloudTrail logs storage.
- Generated a regional customer-managed KMS key.
- Initiated a trail with the following settings:
multi-region option – selected
Log file SSE-KMS encryption – enabled
Select the key created in step 2
Trail log location set to bucket created in step 1 - Create an S3 bucket named s3://xxxxx-user-activitiy-athena-result-xxxxx for CloudTrail logs storage.
- Accessed the AWS account as an Admin user and navigated to the AWS Athena portal.
- Configured the Query Result Location to point to the bucket established in step 4.
- Executed the SQL command
CREATE DATABASE IF NOT EXISTS cloudtrail_logs_db;
to instantiate a new database. - Utilized an SQL command provided from my GitHub repository to generate a
cloudtrail_logs
table that matches the structure observed in CloudTrail logs.https://github.com/hihinsonli/Automation_Jenkins/blob/main/aws/athena/cloudtrail_query.sql - Prepared a Python script to test the connectivity between the Jenkins job and the AWS Athena query execution.
https://github.com/hihinsonli/Automation_Jenkins/blob/main/python/check_athena_query_status.py - Prepared a Python script to visualize CloudTrail data filtered by Athena query.
https://github.com/hihinsonli/Automation_Jenkins/blob/main/python/visualize_cloudtrail_data.py - Developed a Jenkinsfile shown below Github link:
https://github.com/hihinsonli/Automation_Jenkins/blob/main/Jenkinsfile/monitor-aws-account-user-activities.Jenkinsfile
If you are interested in viewing my Jenkins job output, feel free to jump on https://automation.hinsonli.com via my Visitor account (username: visitor password: Visitor123)
Any recommended advice would be extremely appreciated and please leave your insights at comment section.
To conclude my exploration into integrating AWS CloudTrail with AWS S3, AWS Athena, Jenkins, and Python for data visualization, I’ve journeyed through a mix of anticipation, challenges, and valuable learning. Initially driven by the desire to automate security monitoring for my digital ecosystem, I envisioned a pipeline that could transform raw CloudTrail logs into insightful visual analytics. This ambition, however, was met with unexpected hurdles, notably a tricky issue with duplicate “tagging” keys in the CloudTrail data, leading to empty results from Athena queries.
Despite the setbacks, this exploration was far from a failure. It underscored the importance of meticulously preparing and understanding the data structure and the complexities involved in processing and visualizing big data sets. The realization that not all data anomalies are straightforward to resolve has steered me towards considering Security Information and Event Management (SIEM) products as a more robust solution for my needs.
Throughout this process, from setting up AWS S3 buckets and KMS keys to writing Python scripts for data visualization, I’ve gained a deeper appreciation for the intricacies of cloud services and the power of automation in cloud security. The technical challenges encountered have not only enhanced my problem-solving skills but also enriched my understanding of AWS services and their interplay.
As I document my steps and reflections, I aim to share not just a guide for setting up a similar pipeline but also the critical lessons learned about the importance of security, the challenges of data management, and the beauty of automation in technology. This journey, though marked by an unanticipated hurdle, has been a profound learning experience, emphasizing that in the realm of technology and security, every challenge is an opportunity to grow and learn.
Good story, Hinson. I’m so impressed with your blog. Security is indeed the top priority now.