Setup

model_name = "t5-small"

max_len = 512
bs = 16
val_bs = bs*2

lr = 2e-5
# datasets = load_dataset("xsum")
train_ds = load_dataset("xsum", split='validation')

Dataloaders

tokenizer = AutoTokenizer.from_pretrained(model_name)
# train_ds = concatenate_datasets([datasets['train'], datasets['validation']])
splits = RandomSplitter()(train_ds)
@ItemTransform
def untuple1(x):
    return (*x[0], )
dblock = DataBlock(
    blocks = [TransformersTextBlock(tokenizer=tokenizer, do_targets=True, with_labels=True)],
    get_x=TextGetter('document', 'summary', prefix1='summarize: '),
    item_tfms=untuple1,
    splitter=IndexSplitter(splits[1]))
%%time
dls = dblock.dataloaders(train_ds, bs=bs, val_bs=bs*2, shuffle=True)
CPU times: user 6.79 s, sys: 1.34 s, total: 8.13 s
Wall time: 15.6 s
dls.show_batch(max_n=4)
text text_
0 summarize: Rowling, who was made an OBE in 2001, has now become a member of the elite Order of the Companions of Honour. Judy Murray, mother of world tennis number one Sir Andy, was made an OBE. And Brit Award winner Sande, who was raised in Alford, Aberdeenshire, was made an MBE. Scottish comedian Billy Connolly was given a knighthood. Judy Murray adds to the honours already handed to her two sons. The tennis coach and former captain of Great Britain's Fed Cup team is being recognised for her work to grow the sport and for encouraging more women into sport. Sir Andy Murray, was knighted in the New Year Honours List, while his brother Jamie was made an OBE last year. Harry Potter author Rowling, who was made an OBE in 2001, becomes a member of the Order of the Companions of Honour, which has a maximum of 65 Tennis coach Judy Murray, pop star Emeli Sande and Harry Potter author JK Rowling are among the well-known faces in Scotland to have received awards in the Queen's Birthday Honours.
1 summarize: Rooney, 31, played 559 times for United, scoring 253 goals. He won five Premier League titles and each of the Champions League, Europa League and FA Cup once after joining from Everton for £27m in 2004. Rooney, who has signed a two-year deal, said he was "ecstatic" and his "first game back will be an emotional day". "It's a great feeling to be back. I cannot wait to meet the lads, get on the training pitch and then get on the pitch to play," he added. Rooney's return comes as United look set to sign Everton striker Romelu Lukaku, with a £75m deal for the Belgium international agreed between the two clubs. Everton confirmed Rooney will wear the number 10 shirt previously worn by Lukaku. "I'm not just coming back because it's the team I support, the team I grew up playing for - I'm coming back because I Manchester United record goalscorer Wayne Rooney has rejoined Everton for an undisclosed fee, 13 years after leaving the Merseyside club.
2 summarize: The last week of Yvette Cooper's leadership campaign isn't exactly how she predicted it. She has spent more time in Parliament than on the stump. As shadow home secretary she has been leading her party's response to the refugee crisis - initially calling on the government to do more, and now pressing for some of those families who have already fled to Europe to be resettled here. This has helped, rather than harmed, her leadership chances but has also meant more conventional campaigning has had to be discarded. "I tend to respond most strongly to the most serious issues but I'm trying to keep this apart from the leadership contest, and trying to get cross party agreement to an appropriate response to the refugee crisis," she tells me. While I met her rivals "on the road" at rallies and meetings, we meet in her parliamentary office. She was on In the fourth of a series of in-depth profiles of the Labour leadership candidates, Iain Watson catches up with Yvette Cooper.
3 summarize: Media playback is not supported on this device The home team took the lead when right-back Ben Marshall tucked in a low shot. Victor Moses's equaliser squirmed in and Payet's free-kick put West Ham ahead before Blackburn were reduced to 10 men by Chris Taylor's red card. Emmanuel Emenike converted twice either side of Cheikhou Kouyate's dismissal, then Payet added a superb solo goal. The Hammers, who last won the FA Cup in 1980, will travel to either Shrewsbury or Manchester United in the last eight. It is only the second time in the past 10 seasons that they have reached the quarter-finals. Read how the action unfolded at Ewood Park West Ham have not won a major trophy for 36 years but, in their final season at the Boleyn Ground, are hoping that they can take the FA Cup with them to the Olympic Stadium. Hammers manager Slaven Dimitri Payet produced a virtuoso performance as West Ham fought back to ease into the FA Cup quarter-finals at Championship side Blackburn.

Training

model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
learn = TransLearner(dls, model, loss_func=noop)
learn.add_cb(RougeScore(tokenizer));
learn.validate()
(#5) [3.887169599533081,0.2408951647750222,0.056876429032900334,0.21799074098609333,0.2180012913860141]
learn.fit_one_cycle(2, 1e-4)
epoch train_loss valid_loss rouge1 rouge2 rougeL rougeLsum time
0 2.848689 2.597462 0.297854 0.100571 0.271334 0.278742 03:19
1 2.711888 2.562764 0.302310 0.103245 0.275056 0.282703 03:19
%%time
res = learn.validate()
display_validation_results(res)
train_loss valid_loss rouge1 rouge2 rougeL
0 2.562764 0.302342 0.103331 0.274998 0.282674
CPU times: user 17.6 s, sys: 391 ms, total: 18 s
Wall time: 19.8 s

So far we computed predictions using single forward pass so token generated at timestep $t$ has access to reference tokens $0:t-1$. But this at inference time we will generate autoregressively previously generated tokens are used to generate the next one. Let's evaluete the model with this more realistic procedure. This can be done by adding GeneratePreds callback:

%%time
res = learn.validate(cbs=GeneratePreds())
display_validation_results(res)
train_loss valid_loss rouge1 rouge2 rougeL
0 2.562764 0.26428 0.067737 0.209218 0.209238
CPU times: user 40.5 s, sys: 515 ms, total: 41 s
Wall time: 42.9 s
sample = train_ds[0]
document_text = ' '.join(sample['document'].split('\n'))
print(f"Document:\n{document_text}")
print(f"\nReference summary: {sample['summary']}")
inp = tokenizer('summarize: '+sample['document'], return_tensors='pt')
pred = learn.generate(inp['input_ids'].to(dls.device))
out = tokenizer.decode(pred[0].cpu(), skip_special_tokens=True)
print(f"\nPredicted summary: {out}")
Document:
The country's consumer watchdog has taken Apple to court for false advertising because the tablet computer does not work on Australia's 4G network. Apple's lawyers said they were willing to publish a clarification. However the company does not accept that it misled customers. The Australian Competition and Consumer Commission (ACCC) said on Tuesday: "Apple's recent promotion of the new 'iPad with wi-fi + 4G' is misleading because it represents to Australian consumers that the product can, with a sim card, connect to a 4G mobile data network in Australia, when this is not the case." The watchdog then lodged a complaint at the Federal Court in Melbourne. At a preliminary hearing, Apple lawyer Paul Anastassiou said Apple had never claimed the device would work fully on the current 4G network operated by Telstra. Apple says the new iPad works on what is globally accepted to be a 4G network. The matter will go to a full trial on 2 May. The Apple iPad's third version went on sale earlier this month, with Australia the first country where it was available. Shoppers lined up by the hundreds at Apple stores on opening day and the company said it had been its strongest iPad launch to date. The ACCC said it was seeking an injunction on sales as well as a financial penalty against Apple, corrective advertising and refunds to consumers. On its website, Apple does state that 4G LTE is only supported on selected networks in the US and Canada.

Reference summary: US technology firm Apple has offered to refund Australian customers who felt misled about the 4G capabilities of the new iPad.

Predicted summary: Apple has filed a complaint against the Australian consumer watchdog for misleading advertising on its new iPad