Twitter's machines call on low-paid humans in battle to keep up with hashtags
TWITTER has revealed it has developed a "computation engine" comprising low-paid human workers to categorise the hashtags and topics enjoying sudden spikes in popularity, to help it to sell advertising against them.
Computer algorithms are currently unable to understand the meaning of some topics, Twitter said.
In a blog post, it gave the example of one the recent US presidential debates, in which Mitt Romney said he had “binders full of women” at his disposal when he was picking his senior staff as Governor of Massachusetts.
The strange turn of phrase was instantly mocked online and spawned a #bindersfullofwomen hashtag among Twitter users, which became so popular it was a “trending topic”.
Trending topics are the most popular hashtags, names, or phrases being discussed on Twitter, so account for a significant proportion of traffic at a given time. Members can set the Twitter website or their client software to display trending topics for the city, country or the whole world, and they can change form one minute to the next.
For computer algorithms, hashtags such as #bindersfullofwom have “never before been seen, so it's impossible to know without very specific context what they mean”, wrote Edwin Chen, a Twitter data scientist, and Alpa Jain, an advertising specialist for the firm.
“How would you know that #bindersfullofwomen refers to politics, and not office accessories?
“Since these spikes in search queries are so short-lived, there’s only a small window of opportunity to learn what they mean.”
Understanding what trending topics mean is crucial for Twitter’s growing advertising business, as advertisers typically want to target their messages to groups of members interested in particular areas, such as politics.
In response to the problem, Twitter revealed it had hired its new “human computation engine” via Mechanical Turk, an Amazon service that links employers with home workers willing to do typically mundane tasks that computers remain incapable of, such as accurately transcribing speech.
“We’ve built a real-time human computation engine to help us identify search queries as soon as they're trending, send these queries to real humans to be judged, and then incorporate the human annotations into our back-end models,” it explained.
“As soon as we discover a new popular search query, we send it to our human evaluators, who are asked a variety of questions about the query.
“For example: as soon as we notice "Big Bird" spiking, we may ask judges on Mechanical Turk to categorize the query, or provide other information that helps us serve relevant Tweets and ads.”
The system is designed to work in real time, so that it can respond to shifts in meaning. For instance, normally the phrase “Big Bird” might refer to the television or ornithology, but during the Presidential debates it became a political topic, as Mr Romney had pledged to cut funding to PBS, the broadcaster of Sesame Street.
Twitter did not reveal how many people its “human computation engine” employs but said it relied on an elite “small custom pool”, “culled from the best of Mechanical Turk”.
“Our custom pool of judges work virtually all day,” it said. “For many of them, this is a full-time job, and they're geographically distributed, so our tasks complete quickly at all hours.”
The firm has also declined to say how much it paid its Mechanical Turk workers. Rates are typically low, however. The task with the highest total pay on the service at time of writing was transcribing and tagging two and a quarter hours of video for $33.57.
Professional transcribing firms estimate that an hour of video takes four hours to transcribe, so assuming home workers on Mechanical Turk could match that speed, they would be on a rate of $3.73 per hour.
Christopher Williams, Telegraph.co.uk