Intelligence Information Gathering: Collecting Twitter Followers with 25 lines of Pyt

Aerosol · January 9, 2015

Many corporations are not aware of the types of data that can be found and used by attackers in the wild. The information that you will be able to find will vary from target to target, but will typically include items such as IP ranges, domain names, e-mail addresses, public financial data, organizational information, technologies used, job titles, phone numbers, usernames and much more. The primary goal of the passive gathering stage is to gather as much actionable data as possible while at the same time leaving few or no indicators that anyone has searched for the data. It takes time and patience to sort through web pages, perform Google hacking, and map systems thoroughly in an attempt to understand the infrastructure of a particular target.

In this article, let’s assume that we have a task to perform a penetration test for an online banking system to verify the ability of guessing valid usernames and passwords. If you were a hacker, what would you do?

Speaking for myself, first I would write up a quick script to create a dictionary file for potential usernames. Secondly, I would find out the company password policy (like password length, number of special character and so on), and based on that, I would build my own password dictionary file. Finally, I would automate the process to see if I can get a correct password or maybe perform a DoS and block the account after X numbers of failed attempts!!

Many users are using the same username for their bank account, Facebook, Twitter, and other social media. So let’s forge a small Python script to illustrate how an attacker could use an ordinary publicly available information and build up a dictionary file which contains Twitter followers for Arab Bank. At the time of writing this article, Arab Bank has around 24,027 followers. Let’s bring them up!

**Disclaimer: all of actions explained in this article are counted under Passive Information Gathering and considered legitimate. We just spotlight a smart way of data collection.**

Build your own dictionary file

Twitter and many social websites have something called API < Application Programming Interface > which allows a programmer to write his own code to interact with Twitter and Get/Post information from/to Twitter. Fortunately we have many libraries in Python that make my job much easier, so all I need to do is to register in Twitter developers and use the developer ID/keys in my script to run. The registration process should be something similar to these snapshots:

Tweepy is a Python third-party library allow us to parse Twitter’s data. Installing Tweepy is pretty easy:

hkhrais@Hkhrais:~$ sudo apt-get install python-pip

hkhrais@Hkhrais:~$ sudo pip install tweepy

Source Code

import tweepy
import time
#insert your Twitter keys here
consumer_key ='blah blah blah'
consumer_secret='blah blah blah'
access_token='blah blah blah'
access_secret='blah blah blah'

auth = tweepy.auth.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)

list= open('/<a title="home" href="http://resources.infosecinstitute.com/">home</a>/hkhrais/Desktop/list.txt','w')

if(api.verify_credentials):
    print 'We sucessfully logged in'

user = tweepy.Cursor(api.followers, screen_name="arabbankgroup").items()

while True:
    try:
        u = next(user)
        list.write(u.screen_name +' n')

    except:
        time.sleep(15*60)
        print 'We got a timeout ... Sleeping for 15 minutes'
        u = next(user)
        list.write(u.screen_name +' n')
list.close()

The code is almost self explanatory. I passed consumer/token keys to function “OauthHandler” to identify/authenticate myself to Twitter, and after that I asked to get the followers ID for ‘arabbankgroup’ and store it in variable “user”.

According to the Twitter development paper, there’s a limit for how many requests a program can ask. In the case of getting the followers ID, we should wait around 15 minutes, otherwise a limit excess exception will show up.

tweepy.error.TweepError: [{'message': 'Rate limit exceeded', 'code': 88}]

Execution Output

hkhrais@Hkhrais:~/Desktop/Tweets$ sudo python Twitter.py
[sudo] password for hkhrais:
We successfully logged in
We got a timeout ... Sleeping for 15 minutes
We got a timeout ... Sleeping for 15 minutes
We got a timeout ... Sleeping for 15 minutes
We got a timeout ... Sleeping for 15 minutes
...
We got a timeout ... Sleeping for 15 minutes
Traceback (most recent call last):
  File "Twitter.py", line 31, in <module>
    u = next(user)
  File "/usr/local/lib/python2.7/dist-packages/tweepy/cursor.py", line 181, in next
    self.current_page = self.page_iterator.next()
  File "/usr/local/lib/python2.7/dist-packages/tweepy/cursor.py", line 64, in next
    raise StopIteration
StopIteration
hkhrais@Hkhrais:~/Desktop/Tweets$

Note that the last exception indicates iteration completion, which means we’ve grabbed the whole list of followers’ usernames the result:

Conclusion

Intelligence gathering requires careful planning, research, and, most importantly, the ability to think like an attacker. With a small Python script (around 25 lines), we could retrieve 24,027 followers’ usernames for @arabbankgroup which can be used as a good dictionary of usernames. Keep in mind that this script gets very handy, especially if our target usernames are non English!

References

• Twitter API

https://dev.twitter.com/docs/twitter-libraries

• Tweepy library

https://pypi.python.org/pypi/tweepy/

Source