Cyber Space

Massive Cryptojacking Campaign

More than 1,70,000 MikroTik Routers Enslaved

Security researchers uncovered a colossal cryptocurrency mining campaign that involved the utilization of MikroTik routers. The attackers used the settings of the routers in order to leverage the mining script of the Coinhive in-browser cryptocurrency.
MalwareHunerBR, a Brazilian researcher was the first one to discover the attack in Brazil. But, later on, more researchers found that MikroTik routers all across the world were being targeted by the attackers.

Zero-Day Exploits
A zero-day exploit was utilized by the attackers and it was uncovered within the routers’ Winbox component. Even though the exploit had essentially been patched by the manufacturing company within hours of its discovery, apparently, not all of the router owners have actually applied the patch.

Simon Kenin, a Trustwave researcher stated: ‘After a meticulous analysis of the cryptojacking campaign was conducted, we came to the conclusion that the attackers had essentially modified more than 1,70,000 MikroTik routers’ configurations in order to inject the Coinhive.’

According to Kenin, hundreds of thousands of MikroTik routes are being utilized all across the globe, and they’re in use by ISPs as well as different companies and organizations.  Every router serves “tens, if not hundreds of users” on a daily basis.
Users of MikroTik routers are currently being advised to immediately update their firmware so that they can be safe from these types of cryptojacking attacks.

Voice Recognition or Voice Authentication?
Voice authentication seems like cool technology, but it turns out you can crack it using machine learning and open-source tools, according to John Seymour, a Salesforce Senior Data Scientist, and Azeem Aqil, a Salesforce Software Engineer, who presented their findings.

“With machine learning, voice authentication is becoming ubiquitous, said Azil. “You can open your phone by saying the special sentence. Neither Google nor Apple calls this authentication, though. And you can only open a subset of features using voice. We suspect they knew calling it authentication would be a battle.”

The duo’s goal was “to break voice authentication with minimal effort,” Aqil said. “By breaking, we mean gaining access by impersonation.  By minimal effort, we mean it shouldn’t require tons of computing—think desktop rather than server farm. It should finish in a reasonable time. And it should require little or no data-science expertise.”

Seymour showed a clip from the movie Sneakers. The hackers fake their way past voice recognition by social-engineering the target into speaking the individual words on tape.

“In practice, this is hard to do,” noted Seymour. “The people you want are busy CEOs, politicians, and others who might not sit down with you. Luckily, there’s text to speech. We don’t care about the sound quality of our audio. It could sound like garbage as long as the recognition software accepts it.”

Common wisdom holds that to create a really good text-to-speech of a person’s voice, you need 24 hours of speech that’s labelled to indicate exactly what’s being said. That doesn’t meet the goal of a hack that finishes in a reasonable time.

“We wanted to do a proof of concept on this idea,” said Seymour. “We used the website LyreBird, founded by pioneers in text-to-speech and machine learning. You create an account, say 30 pre-defined sentences, and give it a text to speak back. It only takes a few minutes.”

When they had the website speak the required phrase, Microsoft’s speech software accepted it.

Of course, you couldn’t get the target of your voice hack to speak those 30 sentences. Aqil and Seymour instead scraped audio from YouTube videos of the unnamed target. They laboriously cleaned up the audio, removing noise and words like “um”. And they transcribed it manually. Then, they fed the result into the open-source Tacotron tool. “You don’t need to understand Tacotron to use it,” noted Aqil.

The result wasn’t sufficient to create a credible fake voice, so they augmented the data by raising and lowering the pitch, effectively creating 30 times as much input. For a sanity check, they tried pitch-modified recordings on Siri, and found that it accepted a range roughly between 10 percent slower and 20 percent faster. Even with this augmentation, they got garbage. There just wasn’t enough data.

It turns out that there are two huge open-source datasets for use in text-to-speech, Blizzard and LJ Speech. When the pair tried training first on one of these models and then switched to their own data, they hit the jackpot. “It’s like training the model where Blizzard teaches the model to speak,” said Seymour, “and training with our data trains it to speak like the target.” Training the model took a day or two, but the result consistently passed the test, breaking into a test account using text-to-speech.

Don’t Rely on Voice Authentication
“Speaker recognition with unknown words is hard,” said Aqil, “but the passphrase may not be secret. Even if it’s not pre-defined, you speak it out loud. It’s like giving away your password. You should treat voice authentication as only a weak signal on top of multi-factor authentication. Speaker recognition is not the same as speaker authentication.”

“Speaker authentication can be broken if the attacker can obtain speech data and knows the correct prompt,” said Seymour. “Data augmentation and transfer learning make the process accessible in a reasonable time. Spoofing someone else’s voice will just become easier.”

Sanjay Gade

Leave a Comment