When dealing with documents that contain repetitive data structures like invoices, purchase orders, or shipping manifests, extracting such information can be challenging if your toolset doesn't support handling arrays effectively. However, the concept you're referring to—ARRAY fields in document parsing APIs—provides a powerful solution for these scenarios.
What are ARRAY Fields?
An ARRAY field is designed to capture and structure repetitive data found within tables or lists on documents. Instead of extracting each row as a separate entity, an array field allows you to define the schema of one row and then extract all rows that match this pattern into a single structured output.
Example: Extracting Invoice Line Items
Let's consider an invoice document with line items:
| Item | Description | Quantity | Price |
|---|---|---|---|
| 1 | Widget A | 5 | $20.00 |
| 2 | Widget B | 3 | $15.00 |
To extract this data using an ARRAY field, you would define a schema that matches the structure of each row:
json1{ 2 "fields": [ 3 { 4 "name": "line_items", 5 "type": "ARRAY", 6 7[Read the full article at DEV Community](https://dev.to/iterationlayer/extracting-structured-data-from-scanned-documents-ocr-plus-field-validation-1i30) 8 9--- 10 11**Want to create content about this topic?** [Use Nemati AI tools](https://nemati.ai) to generate articles, social posts, and more.

![[AINews] The Unreasonable Effectiveness of Closing the Loop](/_next/image?url=https%3A%2F%2Fmedia.nemati.ai%2Fmedia%2Fblog%2Fimages%2Farticles%2F600e22851bc7453b.webp&w=3840&q=75)



