Cloud Based storage is one of the cornerstones of the internet today, most images you upload or files you upload to applications will be stored in unstructured storage buckets/blobs on different cloud platforms. These storage devices can often be misconfigured in numerous different ways and often sensitive data can be exposed. This is no shock when looking at the process behind setting up these storage containers, whether it be AWS S3 Buckets, Azure Blobs or Google Storage Buckets. The process simply involved a series of check boxes which control the configuration of the container. You will notice just how easy it is to make a mistake during this process, especially if you were not aware of what each option meant and the consequences of a misconfiguration.
The initial idea of this research was to develop a tool that would allow a large number of S3 buckets to be scanned quickly and look for such a misconfiguration. A very simple tool was written in python which conducted subdomain enumeration on *.s3.amazonaws.com and then iterate through enumerated domains to look for directory listing enabled, which is a great indicator of misconfigured S3 buckets. You can see what an incorrectly configured S3 bucket looks like below.
The tool was very simple to build based upon the fact that distinguishing between correctly and incorrectly configured buckets was easy to implement. Also due to directory listing being enabled on the majority of these misconfigured buckets it was easy to parse for any file extensions and keywords that may be of interest. Common extensions and keywords that we implemented into the scanning functionality were: sql, sql.gz, backup.zip, backup.gz, backup.tar, backup.tar.gz alongside many more.
The image above also shows what a real-life positive hit would look like for a misconfigured bucket containing a sensitive file of interest matching our keywords/extensions. You can see the presence of an sql.gz file, which in this case was a back-up to a SQL database. Inside these files there are often sensitive information range from account data, passwords and PII.
Once tool development was finalised and the data from the runs was analysed, it was decided there would be more research conducted to try and broaden our attack surface and find more buckets to scan. Upon doing we came across https://buckets.grayhatwarfare.com/ which had essentially done the job for us. This site is nothing short of incredible for this area of research, it currently has hundreds of thousands of S3 buckets scanned and indexed, alongside tens of thousands of Azure Blobs scanned and indexed. Grayhat is a user friendly web-application that allows you to search through the indexed data using both keywords and extensions, very much like our initial simple python script on steroids.
Now the initial issue was solved of attack surface we decided to fully utilize Grayhat and its API. We changed the approach to create a python tool that would allow us to use the API to make a large number of requests and ex-filtrate buckets/blobs which contained interesting files. We would also check if any of the buckets I was reporting were writable using the command: aws s3 cp proof.txt s3://[BUCKET_NAME] — no-sign-request. This would often further increase the impact as you could host malicious files on the companies storage, or even just run them up a pricey AWS bill. This is definitely one to try on any buckets you find that have bounty programs!
In conclusion, cloud based storage is great but it is very easy to make catastrophic mistakes. If you are setting them up then please ensure you test the access control yourself before uploading any sensitive files. We would also advise anyone who is keen to start on this research to use Grayhat as it is an incredible asset. There are still many, many public buckets out there with SQL database back-ups sitting on them so do your part and help secure these companies by disclosing your findings responsibly and you may just come across some that reward you for doing so like our findings below.