Jan. 13, 2022
Haskayne scholars use artificial intelligence to help detect fraudulent websites
Every year, billions of dollars are lost to online fraud when people enter their credit card or other financial information on a website thinking it’s a credible entity, when it’s not.
“Online fraud and illegitimate websites are a major financial drain on the system,” says Dr. Raymond Patterson, PhD, professor of business technology management (BTM) at the Haskayne School of Business. “From banking to credit cards to regulators, a number of players would love to have a better handle on finding these illegitimate, fraudulent websites. There are a lot of different interests that would desperately benefit from knowing who is a good player and who is a bad player.”
But identifying the bad guys is easier said than done. Unlike a bricks and mortar enterprise, fraudulent websites can just vanish. “It's like playing whack-a-mole with an illegitimate or a fraudulent website,” says Patterson. “Once they're discovered, they'll just go out and get a new URL. It's very hard to just have a list of bad guys. They just move.”
Patterson, Haskayne PhD student Afrouz Hojati and Dr. Ram Gopal, of the University of Warwick, U.K., developed artificial intelligence techniques to help detect illegitimate websites. They built algorithms that could, with further research, provide a first line of defence in identifying potentially fraudulent websites. The research,Ěý"," was published in Decision Support Systems.
Research the first to make inroads into generalized fraud detector
It provides the first steps toward developing consumer protection tools such as an early warning alarm that goes off when visiting a potentially fraudulent website. “This is really one of the first major attempts to make inroads on a generalized fraud detector. A lot of the previous research has been in specific domains,” says Patterson.
The research builds on an algorithm the researchers developed to identify “fake news” websites. “It was a very successful algorithm to detect whether or not a news website was fraudulent or legitimate. It had very high accuracy. But when you take those algorithms that are crafted for a particular context, like news, they don't do as well when you just throw any website at them.”
In this research, the scholars used websites’ third party request structure and information about the legitimacy of third parties to create real-time machine learning algorithms. “Whenever you go to a website, there are sometimes hundreds of third parties,” says Patterson. “You can have a third party that calls another third party, and then they call a bunch of third parties. And throughout the layers, some of those are actually very nefarious third parties.”
Third parties can crash your online party
Patterson compares the third parties to teenagers throwing a party. “You call three or four or 10 of your friends for a party on Friday night and 200 people show up. It’s exactly the same thing that happens with third parties. You might have called a few of your friends, but they called a whole bunch of their friends, and they called a bunch of their friends too.”
The researchers created software that lets them see the call structure — the "who’s calling who” — of those third parties. “Then we can reconstruct the order of call. So in a sense, I know which teenagers invited which kids to mom and dad's house.” The software runs the list of third parties through a standard database to identify whether they are legitimate, or not.
The researchers’ generalized algorithm works across a broad spectrum of websites in different industries. It’s less costly, less computationally complex, and less time consuming than existing detection algorithms. And because this third-party sharing information can be observed, it’s hard for the nefarious actors to manipulate or circumvent the algorithm.
“This is a really tough problem,” says Patterson. “I've been thinking about this problem for a long time and I've always referred to it as the holy grail. We finally made headway when we started addressing not the algorithms, but the data structures. How could we represent the data in a different way that would shed a little more light and make it easier for the algorithms to detect. It’s a one-size-fits-all approach.”