Skip to content

Commit 7fa0a8d

Browse files
authored
Merge pull request #1674 from oracle-devrel/ao-smart-invoice-extraction
smart invoice extraction tool
2 parents 7481c8f + f7ef3f3 commit 7fa0a8d

File tree

3 files changed

+300
-0
lines changed

3 files changed

+300
-0
lines changed
Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
# 🧾 Invoice Analysis Plus
2+
3+
An intelligent invoice data extractor built with **OCI Generative AI**, **LangChain**, and **Streamlit**. Upload any invoice PDF and this app will extract structured data like REF. NO., POLICY NO., DATES, etc. using multimodal LLMs.
4+
5+
---
6+
7+
## 🚀 Features
8+
9+
- 🔍 Automatically identifies key invoice headers using OCI Vision LLM (LLaMA 3.2 90B Vision)
10+
- 🤖 Lets you choose what elements to extract (with type selection)
11+
- 🧠 Leverages a text-based LLM (Cohere Command R+) for context-aware value extraction
12+
- 🧪 Outputs data in clean **JSON** and saves to **CSV**
13+
- 🖼️ Uses image-based prompt injection for high accuracy
14+
15+
---
16+
17+
## 🛠️ Tech Stack
18+
19+
| Tool | Usage |
20+
|---------------------|------------------------------------------|
21+
| 🧠 OCI Generative AI | Vision + Text LLMs for extraction |
22+
| 🧱 LangChain | Prompt orchestration and LLM chaining |
23+
| 📦 Streamlit | Interactive UI and file handling |
24+
| 🖼️ pdf2image | Convert PDFs into JPEGs |
25+
| 🧾 Pandas | CSV creation & table rendering |
26+
| 🔐 Base64 | Encodes image bytes for prompt injection|
27+
28+
---
29+
30+
## 🧠 How It Works
31+
32+
1. **User Uploads Invoice PDF**
33+
The file is uploaded and converted into an image using `pdf2image` (Ensure you upload one page documents ONLY)
34+
35+
2. **Initial Header Detection (LLaMA-3.2 Vision)**
36+
The first page is passed to the multimodal LLM which returns a list of fields that are likely to be useful (e.g., "Policy No.", "Amount", "Underwriter").
37+
38+
3. **User Selects Fields and Types**
39+
A UI allows the user to pick 3 fields from the detected list, and specify their data types (Text, Number, etc.).
40+
41+
4. **Prompt Generation (Cohere Command R+)**
42+
The second LLM generates a custom system prompt to extract those fields as JSON.
43+
44+
5. **Full Invoice Extraction (LLaMA-3.2 Vision)**
45+
Each page image is passed into the multimodal LLM using the custom prompt, returning JSON values for the requested fields.
46+
47+
6. **Data Saving & Display**
48+
All data is shown in a `st.dataframe()` and saved to CSV.
49+
50+
---
51+
52+
## 📁 File Structure
53+
54+
```bash
55+
.
56+
├── app.py # Main Streamlit app
57+
├── requirements.txt # Python dependencies
58+
└── README.md # This file
59+
```
60+
61+
---
62+
63+
## 🔧 Setup
64+
65+
1. **Clone the repository**
66+
67+
```bash
68+
git clone <repository-url>
69+
cd <repository-folder>
70+
```
71+
72+
2. **Install dependencies**
73+
74+
```bash
75+
pip install -r requirements.txt
76+
```
77+
78+
3. **Run the app**
79+
80+
```bash
81+
streamlit run app.py
82+
```
83+
84+
> ⚠️ **Important Configuration:**
85+
>
86+
> - Replace all instances of `<YOUR_COMPARTMENT_OCID_HERE>` with your actual **OCI Compartment OCID**
87+
> - Ensure you have access to **OCI Generative AI Services** with correct permissions
88+
> - Update model IDs in the code if needed:
89+
> - Vision model: `meta.llama-3.2-90b-vision-instruct`
90+
> - Text model: `cohere.command-r-plus-08-2024`
91+
92+
---
93+
94+
## 📁 Output Sample
95+
96+
```json
97+
[
98+
{
99+
"REF. NO.": "IN123456",
100+
"INSURED": "Acme Corp",
101+
"POLICY NO.": "POL987654",
102+
"File Name": "invoice1.pdf",
103+
"Page Number": 1
104+
},
105+
...
106+
]
107+
```
Lines changed: 185 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
import pandas as pd
2+
import json
3+
from langchain.chains.llm import LLMChain
4+
from langchain_core.prompts import PromptTemplate
5+
import streamlit as st
6+
from langchain_community.chat_models.oci_generative_ai import ChatOCIGenAI
7+
from langchain_core.messages import HumanMessage, SystemMessage
8+
import base64
9+
from pdf2image import convert_from_bytes
10+
import io
11+
12+
# Helper function to convert a list of images into byte arrays for further processing
13+
def save_images(images, output_format="JPEG"):
14+
image_list = []
15+
for image in images:
16+
img_byte_arr = io.BytesIO()
17+
image.save(img_byte_arr, format=output_format)
18+
img_byte_arr.seek(0)
19+
image_list.append(img_byte_arr)
20+
return image_list
21+
22+
# Helper function to encode an image to base64 for sending to LLM
23+
def encode_image(image_path):
24+
with open(image_path, "rb") as image_file:
25+
return base64.b64encode(image_file.read()).decode("utf-8")
26+
27+
# Save extracted data to a CSV file and show success message in Streamlit
28+
def save_to_csv(data, file_name="extracted_data.csv"):
29+
df = pd.DataFrame(data)
30+
df.to_csv(file_name, index=False)
31+
st.success(f"Data saved to {file_name}")
32+
33+
# Extract key headers from the first image of a PDF invoice
34+
def extractor(image_list):
35+
# Replace this with your own compartment ID
36+
compID = "<YOUR_COMPARTMENT_OCID_HERE>"
37+
38+
# Load a multimodal LLM for invoice header analysis
39+
llm = ChatOCIGenAI(
40+
model_id="meta.llama-3.2-90b-vision-instruct", # Replace with your model ID
41+
compartment_id=compID,
42+
model_kwargs={"max_tokens": 2000, "temperature": 0}
43+
)
44+
45+
# Encode the first page as base64
46+
encoded_frame = base64.b64encode(image_list[0].getvalue()).decode("utf-8")
47+
48+
with st.spinner("Extracting the key elements"):
49+
# Provide system instruction to extract headers from invoice
50+
system_message = SystemMessage(
51+
content="""Given this invoice, extract in list format, all the headers that can be needed for analysis
52+
For example: [\"REF. NO.\", \"INSURED\", \"REINSURED\", \"POLICY NO.\", \"TYPE\", \"UNDERWRITER REF. NO.\", \"PERIOD\", \"PARTICULARS\", \"PPW DUE DATE\"]
53+
Return the answer in a list format, and include nothing else at all in the response.
54+
"""
55+
)
56+
57+
# Human message includes the image
58+
human_message = HumanMessage(
59+
content=[
60+
{"type": "text", "text": "This is my invoice"},
61+
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encoded_frame}"}},
62+
]
63+
)
64+
65+
# Invoke the LLM and extract elements
66+
ai_response = llm.invoke(input=[human_message, system_message])
67+
st.caption("Here are some key elements you may want to extract")
68+
return eval(ai_response.content)
69+
70+
# Main Streamlit app function
71+
def invoiceAnalysisPlus():
72+
st.title("Invoice Data Extraction")
73+
74+
with st.sidebar:
75+
st.title("Parameters")
76+
# Replace with your own compartment ID
77+
compID = "<YOUR_COMPARTMENT_OCID_HERE>"
78+
user_prompt = st.text_input("Input the elements you are looking to extract here")
79+
st.caption("Our AI assistant has extracted the following key elements from the invoice. Please select the elements you wish to extract.")
80+
81+
uploaded_file = st.file_uploader("Upload your invoices here:", type=["pdf"])
82+
83+
if uploaded_file is not None:
84+
with st.spinner("Processing..."):
85+
# Convert PDF to image list
86+
if uploaded_file.type == "application/pdf":
87+
images = convert_from_bytes(uploaded_file.read(), fmt="jpeg")
88+
else:
89+
images = [convert_from_bytes(uploaded_file.read(), fmt="jpeg")[0]]
90+
91+
# Save as byte streams
92+
image_list = save_images(images)
93+
94+
# Load both image-based and text-based LLMs
95+
llm = ChatOCIGenAI(
96+
model_id="meta.llama-3.2-90b-vision-instruct", # Replace with your model ID
97+
compartment_id=compID,
98+
model_kwargs={"max_tokens": 2000, "temperature": 0}
99+
)
100+
llm_for_prompts = ChatOCIGenAI(
101+
model_id="cohere.command-r-plus-08-2024", # Replace with your model ID
102+
compartment_id=compID,
103+
model_kwargs={"max_tokens": 2000, "temperature": 0}
104+
)
105+
106+
# Select box UI for user to pick elements and their data types
107+
data_types = ["Text", "Number", "Percentage", "Date"]
108+
elements = []
109+
110+
if "availables" not in st.session_state:
111+
st.session_state.availables = extractor(image_list)
112+
113+
for i in range(3): # Max 3 fields
114+
col1, col2 = st.columns([2, 1])
115+
with col1:
116+
name = st.selectbox(f"Select an element {i+1}", st.session_state.availables, key=f"name_{i}", index=i)
117+
with col2:
118+
data_type = st.selectbox(f"Type {i+1}", data_types, key=f"type_{i}")
119+
elements.append((name, data_type))
120+
121+
# Generate appropriate prompt based on selected or input fields
122+
if elements:
123+
system_message_cohere = SystemMessage(
124+
content=f"""
125+
Based on the following set of elements {elements}, with their respective types, extract their values and respond only in valid JSON format (no explanation):
126+
{', '.join([f'- {e[0]}' for e in elements])}
127+
For example:
128+
{{
129+
{elements[0][0]}: "296969",
130+
{elements[1][0]}: "296969",
131+
{elements[2][0]}: "296969"
132+
}}
133+
"""
134+
)
135+
ai_response_cohere = system_message_cohere
136+
else:
137+
system_message_cohere = SystemMessage(
138+
content=f"""
139+
Generate a system prompt to extract fields based on user-defined elements: {user_prompt}.
140+
Output should be JSON only. No other text.
141+
"""
142+
)
143+
ai_response_cohere = llm_for_prompts.invoke(input=[system_message_cohere])
144+
145+
# Extracted data list
146+
extracted_data = []
147+
148+
with st.spinner("Analyzing invoice..."):
149+
for idx, img_byte_arr in enumerate(image_list):
150+
try:
151+
encoded_frame = base64.b64encode(img_byte_arr.getvalue()).decode("utf-8")
152+
153+
if elements:
154+
system_message = ai_response_cohere
155+
else:
156+
system_message = SystemMessage(content=ai_response_cohere.content)
157+
158+
human_message = HumanMessage(
159+
content=[
160+
{"type": "text", "text": "This is my invoice"},
161+
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encoded_frame}"}},
162+
]
163+
)
164+
165+
ai_response = llm.invoke(input=[human_message, system_message])
166+
json_start = ai_response.content.find('{')
167+
json_end = ai_response.content.find('}', json_start)
168+
json_data = ai_response.content[json_start:json_end + 1]
169+
170+
response_dict = json.loads(json_data)
171+
response_dict["File Name"] = uploaded_file.name
172+
response_dict["Page Number"] = idx + 1
173+
extracted_data.append(response_dict)
174+
175+
except Exception as e:
176+
st.error(f"Error processing page {idx+1}: {str(e)}")
177+
178+
# Display and save results
179+
if extracted_data:
180+
save_to_csv(extracted_data)
181+
st.dataframe(pd.DataFrame(extracted_data))
182+
183+
# Run the app
184+
if __name__ == "__main__":
185+
invoiceAnalysisPlus()
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
pandas==1.5.3
2+
json==2.0.9
3+
langchain==0.0.209
4+
langchain_community==0.0.1
5+
streamlit==1.24.0
6+
oci==2.97.0
7+
pdf2image==1.16.3
8+
Pillow==8.4.0

0 commit comments

Comments
 (0)