Screen Scraping for Data Exfiltration
Screen Scraping or Web Scraping is a technique commonly used by both legitimate business and cyber criminals to collect data from web pages.
Whilst screen scraping can be used perfectly legitimately, our research has found that cyber criminals use scraping in a variety of ways in order to steal data and commit fraud.
There are a range of commercial and free tools available for those who want to screen scrape as well as some detailed guidance on what it is and how to to build tools.
STORM have uncovered much discussion on a range of dark web forums relating to screen scraping and our investigation of many Business Email Compromise (BEC) incidents has led us to investigate the use of this technique for the extraction of mailbox data.
Our findings are significant and show that screen scraping can be easily used to extract mailbox data (messages, attachment data, contacts etc.) with ease and with very little recorded to detect such activity.
Is Screen Scraping Legal?
There are a few articles dealing with this issue, one dealing with it here. The issue came to the fore post the Cambridge Analytica scandal where it had been clear that screen scraping was being used as one of the collection methods employed to build massive data sets. Suffice to say, if screen scraping is performed in conjunction with any unauthorised access then it is illegal.
How Can Screen Scraping Be Used by Cyber Criminals?
One of, if not the key aim of cyber criminals is to commit fraud. There are a series of steps that attackers behind BEC incidents use in order to achieve their goal. Here is a good article on web scraping for fraud.
The first is often phishing to obtain a users credentials, this is then followed by unauthorised access to the victims email account and mailbox. Once this has been achieved the attackers need to assess the risks which may result in failing to achieve their aims; direct fraud and stealing information to sell. In the assessment they need to weigh up the chances of lockout and detection. Lockout may occur as a result of detection or as part of a normal password change by the legitimate user.
So, it follows that timeliness is a key consideration in the minds of successful fraudsters and because of the risk of lockout, they must move quickly to capitalise on the access they have obtained. Often, this means moving the attack 'out-of-band', either completely or partially, from the compromised mailbox. To do this the attackers must first exfiltrate the mailbox data.
There are two principle methods of exfiltration. Mailbox synchronization using the stolen credentials; a popular method but one that is more likely to be detected and screen scraping. We have created a demonstration of screen scraping a mailbox here.
Here is another video showing how simple it is to scrape mailbox contacts.
Once the mailbox data has been exfiltrated, the attackers will use it to plan and execute a 'man-in-the-middle' fraud with a combination of techniques involving intercepting and sending messages from both the original compromised mailbox (often leveraging rules) and fake mail accounts set up with variants of domains used by the victim organisations. The careful use of these techniques ensures that fraud scams complete successfully for the criminals.
Can Illegal Screen Scraping be Detected?
Detection may occur if the legitimate user or counter-parties with whom the fraudsters interact, realise that a scam is in progress. It may also occur if the mail system being used has adequate logging and alerting configured. Our many investigations of BEC incidents show that unfortunately this latter safeguard is rarely implemented.
However, sometimes the logs are available for investigation post-incident and these may allow us to determine whether mailbox synchronization has taken place.
Evidence of screen scraping is much harder to discover because at present, mailbox auditing is not sophisticated enough to detect and alert the difference between the actions of a human user and an automated process.
All is not lost however, as we are often able to detect the unauthorised access through geo-location. Unfortunately though, whilst timing analysis on verbosely configured logs might allow us to take a view on the use of screen scraping, it is unlikely to be conclusive.
The Likelihood of Data Breach
Given the above and our demonstration of the ease of screen scraping and the discussions on hacker forums, it is now increasingly likely that this technique is being used to exfiltrate data during BEC incidents. In fact, the low possibility of detection may present screen scraping as a preferred, even de facto activity in a high proportion of BEC attacks. It certainly means that it will be increasingly hard for victims of unauthorised access to claim that a data breach has not occurred due to lack of evidence.
Quite apart from the fact that many such victims do not have evidence precisely because they did not have logging enabled, the use of screen scraping for data exfiltration now means that they would be unlikely to have compelling evidence of a data breach anyway.