AWS' launches Textract tool capable of reading millions of files in a few hours

The machine learning-powered tool promises to be the most accurate for scalping data

AWS logo on black background

AWS has said that its Textract tool, designed to extract and translate data between files, is now generally available for all customers.

The tool, which is a machine learning-driven feature of its cloud platform, lets customers autonomously extract data from documents and accurately convert it into a usable format, such as exporting contractual data into database forms.

The fully-managed tool requires no machine learning knowledge to use and works in virtually any document. Industries that work with specific file types such as financial services, insurance and healthcare will also be able to plug these into the tool.

Textract aims to expedite the laborious data entry process that is also often inaccurate when using other third-party software. Amazon claims it can accurately analyse millions of documents in "just a few hours".

Advertisement
Advertisement - Article continues below

"Many companies extract text and data from files such as contracts, expense reports, mortgage guarantees, fund prospectuses, tax documents, hospital claims, and patient forms through manual data entry or simple OCR software," the company said.

"This is a time-consuming and often inaccurate process that produces an output requiring extensive post-processing before it can be put in a format that is usable by other applications," it added.

Textract takes data from scanned files stored in Amazon S3 buckets, reads them and returns data in JSON text annotated with the page number, section, form labels, and data types.

PwC is already using the tool for its pharmaceutical clients, an industry that commonly uses processes that involve Food and Drug Administration (FDA) forms that would otherwise require hours to complete, according to Siddhartha Bhattacharya, director lead, healthcare AI at PwC.

"Previously, people would manually review, edit, and process these forms, each one taking hours," he said. "Amazon Textract has proven to be the most efficient and accurate OCR solution available for these forms, extracting all of the relevant information for review and processing, and reducing time spent from hours to down to minutes."

The Met Office is another organisation that plans to implement Textract, making use of old weather records.

"We hope to use AmazonTextract to digitise millions of historical weather observations from document archives," said Philip Brohan, climate scientist at the Met Office. "Making these observations available to science will improve our understanding of climate variability and change."

Featured Resources

The IT Pro guide to Windows 10 migration

Everything you need to know for a successful transition

Download now

Managing security risk and compliance in a challenging landscape

How key technology partners grow with your organisation

Download now

Software-defined storage for dummies

Control storage costs, eliminate storage bottlenecks and solve storage management challenges

Download now

6 best practices for escaping ransomware

A complete guide to tackling ransomware attacks

Download now
Advertisement

Most Popular

Visit/security/identity-and-access-management-iam/354289/44-million-microsoft-customers-found-using
identity and access management (IAM)

44 million Microsoft customers found using compromised passwords

6 Dec 2019
Visit/cloud/microsoft-azure/354230/microsoft-not-amazon-is-going-to-win-the-cloud-wars
Microsoft Azure

Microsoft, not Amazon, is going to win the cloud wars

30 Nov 2019
Visit/hardware/354237/five-signs-that-its-time-to-retire-it-kit
Sponsored

Five signs that it’s time to retire IT kit

29 Nov 2019
Visit/business/business-strategy/354195/where-modernisation-and-sustainability-meet-a-tale-of-two
Sponsored

Where modernisation and sustainability meet: A tale of two benefits

25 Nov 2019