Skip to content

Commit 537d057

Browse files
948910: Extract Text Option
1 parent a8e01ab commit 537d057

File tree

10 files changed

+582
-3
lines changed

10 files changed

+582
-3
lines changed
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
---
2+
layout: post
3+
title: extractTextCompleted Event in EJ2 ASP.NET MVC PDF Viewer | Syncfusion
4+
description: Learn here all about extractTextCompleted Event in ASP.NET MVC PDF Viewer component of Syncfusion Essential JS 2 and more.
5+
platform: ej2-asp-core-mvc
6+
control: PDF Viewer
7+
publishingplatform: ##Platform_Name##
8+
documentation: ug
9+
---
10+
11+
# Extract text using extractTextCompleted Event
12+
13+
The PDF Viewer library allows you to extract the text from a page along with the bounds. Text extraction can be done using the **isExtractText** property and [**extractTextCompleted**](https://help.syncfusion.com/cr/aspnetcore-js2/Syncfusion.EJ2.PdfViewer.PdfViewer.html#Syncfusion_EJ2_PdfViewer_PdfViewer_ExtractTextCompleted) event. `extractTextCompleted` event Triggers when an text extraction is completed in the PDF Viewer.
14+
15+
Here is an example of how you can use the extractTextCompleted event:
16+
17+
{% tabs %}
18+
{% highlight cshtml tabtitle="Standalone" %}
19+
20+
@using Syncfusion.EJ2
21+
@{
22+
ViewBag.Title = "Home Page";
23+
}
24+
25+
<div>
26+
<!-- Render PDF Viewer -->
27+
@Html.EJS().PdfViewer("pdfviewer").DocumentPath("https://cdn.syncfusion.com/content/pdf/pdf-succinctly.pdf").Render()
28+
</div>
29+
30+
<!-- Ensure necessary Syncfusion scripts and styles are included -->
31+
<script src="https://cdn.syncfusion.com/ej2/29.1.33/dist/ej2.min.js"></script>
32+
<script type="text/javascript">
33+
window.onload = function () {
34+
// Initialize PDF viewer instance
35+
var viewer = document.getElementById('pdfviewer').ej2_instances[0];
36+
37+
// Set up the event handler for text extraction completion
38+
viewer.extractTextCompleted = function (args) {
39+
console.log('Extracted Text Completed');
40+
// Log the extracted text collection
41+
console.log(args.documentTextCollection);
42+
43+
// Access text data from page 1 (0-based index)
44+
console.log(args.documentTextCollection[1]);
45+
console.log(args.documentTextCollection[1][1].TextData); // Extracted text data from the second element (adjust index based on your need)
46+
console.log(args.documentTextCollection[1][1].PageText); // Text from the page
47+
48+
// Extract and log the bounds of the first text in the page
49+
console.log(args.documentTextCollection[1][1].TextData[0].Bounds);
50+
};
51+
52+
// Optionally, trigger the text extraction (for example, from page 1)
53+
viewer.extractText(1, 'TextOnly').then(function (val) {
54+
console.log('Extracted Text from Page 1:');
55+
console.log(val);
56+
});
57+
};
58+
</script>
59+
60+
61+
{% endhighlight %}
62+
{% endtabs %}
63+
64+
[View sample in GitHub](https://github.com/SyncfusionExamples/mvc-pdf-viewer-examples/tree/master/How%20to)
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
---
2+
layout: post
3+
title: Extract text Option in EJ2 ASP.NET MVC PDF Viewer | Syncfusion
4+
description: Learn here all about Extract text Option in ASP.NET MVC PDF Viewer component of Syncfusion Essential JS 2 and more.
5+
platform: ej2-asp-core-mvc
6+
control: PDF Viewer
7+
publishingplatform: ##Platform_Name##
8+
documentation: ug
9+
---
10+
11+
# Extract Text Option in Syncfusion PDF Viewer
12+
13+
The `extractTextOption` property in the Syncfusion PdfViewer control allows you to optimize memory usage by controlling the level of text extraction. This setting influences the data returned in the `extractTextCompleted` event. You can select one of the following options to determine the kind of text extraction and layout information to retrieve.
14+
15+
### Available Options:
16+
17+
**None:** No text information is extracted or returned. This is useful when you want to optimize memory usage and don't need any text data.
18+
19+
**TextOnly:** Extracts only the plain text from the document. This option excludes any layout or positional information.
20+
21+
**BoundsOnly:** Extracts layout information, such as bounds or coordinates, without including the plain text data.
22+
23+
**TextAndBounds:** Extracts both the plain text and the layout (bounds) information, which is the default behavior.
24+
25+
The following example demonstrates how to configure the `extractTextOption` property to control the level of text extraction:
26+
27+
28+
29+
{% tabs %}
30+
{% highlight html tabtitle="Standalone" %}
31+
32+
@using Syncfusion.EJ2
33+
@{
34+
ViewBag.Title = "Home Page";
35+
}
36+
37+
<div>
38+
<!-- Render PDF Viewer -->
39+
@Html.EJS().PdfViewer("pdfviewer").DocumentPath("https://cdn.syncfusion.com/content/pdf/pdf-succinctly.pdf").Render()
40+
</div>
41+
42+
<!-- Ensure necessary Syncfusion scripts and styles are included -->
43+
<script src="https://cdn.syncfusion.com/ej2/29.1.33/dist/ej2.min.js"></script>
44+
<script type="text/javascript">
45+
window.onload = function () {
46+
var viewer = document.getElementById('pdfviewer').ej2_instances[0];
47+
viewer.extractTextOption = 'None'; // Options: 'None', 'TextOnly', 'BoundsOnly', 'TextAndBounds'
48+
}
49+
</script>
50+
51+
{% endhighlight %}
52+
{% endtabs %}
53+
54+
### Description of Each Option
55+
**extractTextOption.TextAndBounds (default):** This option returns both plain text and its positional data (bounds). Use this option when you need to access both the content of the PDF and its layout for further processing or analysis.
56+
57+
**extractTextOption.TextOnly:** This option returns only the plain text from the PDF. No positional or layout data is included. Note that when using this option, text search functionality will be disabled. In such cases, it is recommended to use findTextAsync for text searching.
58+
59+
**extractTextOption.BoundsOnly:** This option returns only the layout information (bounds) of the text, excluding the actual content. It is useful when the focus is on the position of text elements rather than the text itself.
60+
61+
**extractTextOption.None:** This option does not extract or return any text or layout information. It is used to optimize memory usage when no text extraction is necessary. This setting is only relevant for the `extractTextCompleted` event and cannot be used with the `ExtractText` method.
62+
63+
N> Text Search: When using the `extractTextOption.TextOnly` and `extractTextOption.None` option, the findText method will not work. Instead, you should use the findTextAsync method to perform text searches asynchronously.
64+
65+
[View sample in GitHub](https://github.com/SyncfusionExamples/mvc-pdf-viewer-examples/tree/master/How%20to)

ej2-asp-core-mvc/pdfviewer/EJ2_ASP.MVC/how-to/extract-text.md

Lines changed: 77 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
layout: post
3-
title: Extract Text in ##Platform_Name## Pdfviewer Component
3+
title: Extract Text in Syncfusion ##Platform_Name## Pdfviewer Component
44
description: Learn here all about Extract Text in Syncfusion ##Platform_Name## Pdfviewer component of Syncfusion Essential JS 2 and more.
55
platform: ej2-asp-core-mvc
66
control: Extract Text
@@ -9,6 +9,7 @@ documentation: ug
99
---
1010

1111
# Extract Text from PDF document
12+
## Extract Text in server backed mode.
1213

1314
The PDF Viewer server library allows you to extract the text from a page along with the bounds. Text extracting can be done using the ExtractText() method. Add the following dependency to your application using the `NuGet Package Manager`.
1415
* Syncfusion.EJ2.PdfViewer.AspNet.Mvc5
@@ -33,4 +34,78 @@ System.IO.File.WriteAllText("../../Data/data.txt", text);
3334
Sample:
3435
[https://www.syncfusion.com/downloads/support/directtrac/general/ze/ExtractText853154752](https://www.syncfusion.com/downloads/support/directtrac/general/ze/ExtractText853154752)
3536

36-
N>Ensure the provided document path and output text saved locations in your application level.
37+
N>Ensure the provided document path and output text saved locations in your application level.
38+
39+
## Extract Text Method in standalone mode
40+
41+
The `extractText` method of the Syncfusion PdfViewer control enables text extraction from one or more pages in a PDF document. This method is useful for retrieving the text content along with its associated data, such as the bounds of each text element.
42+
43+
### extractText Method
44+
The extractText method retrieves text data from the specified page(s) of a PDF document. It can extract text from one page, a range of pages, or even provide detailed text data, depending on the options specified.
45+
46+
#### Parameters:
47+
**startIndex:** The starting page index for text extraction (0-based index).
48+
49+
**endIndex Or isOptions:** This can either be the ending page index for the text extraction (for extracting from multiple pages) or an option specifying text extraction criteria for a single page.
50+
51+
**options (optional):** Specifies additional options, such as extracting plain text `TextOnly` or more detailed text data `TextAndBounds`. You can specify various options for text extraction. These options determine whether you want to extract plain text, text with bounds, or detailed text data.
52+
53+
***TextOnly:*** Extracts only the plain text content without bounds or additional information.
54+
55+
***TextAndBounds:*** Extracts text content along with its bounds (coordinates) within the PDF.
56+
57+
#### Returns:
58+
The method returns a Promise that resolves to an object containing two properties:
59+
60+
**textData:** An array of TextDataSettingsModel objects, each representing the details of the extracted text (including bounds, page text, etc.).
61+
62+
**pageText:** A concatenated string of plain text extracted from the specified page(s).
63+
64+
### Usage of extractText in Syncfusion PdfViewer Control
65+
Here is an example that demonstrates how to use the extractText method:
66+
67+
{% tabs %}
68+
{% highlight cshtml tabtitle="Standalone" %}
69+
70+
@using Syncfusion.EJ2
71+
@{
72+
ViewBag.Title = "Home Page";
73+
}
74+
75+
<div>
76+
<!-- Render PDF Viewer -->
77+
<button onclick="ExtractText()">Extract Text</button>
78+
<button onclick="ExtractTexts()">Extract Texts</button>
79+
@Html.EJS().PdfViewer("pdfviewer").DocumentPath("https://cdn.syncfusion.com/content/pdf/pdf-succinctly.pdf").Render()
80+
</div>
81+
82+
<!-- Ensure necessary Syncfusion scripts and styles are included -->
83+
<script src="https://cdn.syncfusion.com/ej2/29.1.33/dist/ej2.min.js"></script>
84+
<script type="text/javascript">
85+
86+
function ExtractText() {
87+
var viewer = document.getElementById('pdfviewer').ej2_instances[0];
88+
viewer.extractText(1, 'TextOnly').then((val) => {
89+
console.log('Extracted Text from Page 1:');
90+
console.log(val); // Logs the extracted text from page 1
91+
});
92+
}
93+
function ExtractTexts() {
94+
var viewer = document.getElementById('pdfviewer').ej2_instances[0];
95+
viewer.extractText(0, 2, 'TextOnly').then((val) => {
96+
console.log('Extracted Text from Pages 0 to 2:');
97+
console.log(val); // Logs the extracted text from pages 0 to 2
98+
});
99+
}
100+
</script>
101+
102+
103+
{% endhighlight %}
104+
{% endtabs %}
105+
106+
#### Explanation:
107+
**Single Page Extraction:** The first `extractText` call extracts text from page 1 (`startIndex = 1`), using the 'TextOnly' option for plain text extraction.
108+
109+
**Multiple Pages Extraction:** The second extractText call extracts text from pages 0 through 2 (`startIndex = 0, endIndex = 2`), using the `TextOnly` option for plain text extraction.
110+
111+
[View sample in GitHub](https://github.com/SyncfusionExamples/mvc-pdf-viewer-examples/tree/master/How%20to)
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
---
2+
layout: post
3+
title: Find Text Async in EJ2 ASP.NET MVC PDF Viewer | Syncfusion
4+
description: Learn about the `findTextAsync` in ASP.NET MVC PDF Viewer component of Syncfusion Essential JS 2 and more.
5+
platform: ej2-asp-core-mvc
6+
control: PDF Viewer
7+
publishingplatform: ##Platform_Name##
8+
documentation: ug
9+
---
10+
11+
# Find Text using findTextAsync Method in Syncfusion PdfViewer
12+
13+
The findTextAsync method in the Syncfusion PdfViewer control allows you to search for specific text or an array of strings asynchronously within a PDF document. The method returns the bounding rectangles for each occurrence of the search term, allowing you to find and work with text positions in the document.
14+
15+
Here is an example of how you can use the **findTextAsync** method:
16+
17+
18+
{% tabs %}
19+
{% highlight cshtml tabtitle="Standalone" %}
20+
21+
@using Syncfusion.EJ2
22+
@{
23+
ViewBag.Title = "Home Page";
24+
}
25+
26+
<div>
27+
<!-- Render PDF Viewer -->
28+
<button id="findTextBtn" onclick="findText()">Find Text</button>
29+
<button id="findTextsBtn" onclick="findTexts()">Find Multiple Texts</button>
30+
@Html.EJS().PdfViewer("pdfviewer").DocumentPath("https://cdn.syncfusion.com/content/pdf/pdf-succinctly.pdf").Render()
31+
</div>
32+
33+
<!-- Ensure necessary Syncfusion scripts and styles are included -->
34+
<script src="https://cdn.syncfusion.com/ej2/29.1.33/dist/ej2.min.js"></script>
35+
<script type="text/javascript">
36+
37+
function findText() {
38+
var viewer = document.getElementById('pdfviewer').ej2_instances[0];
39+
// Search for a single text ('pdf') across all pages (case insensitive)
40+
viewer.textSearchModule.findTextAsync('pdf', false).then(function (res) {
41+
console.log(res); // Log the search results
42+
});
43+
}
44+
function findTexts() {
45+
var viewer = document.getElementById('pdfviewer').ej2_instances[0];
46+
// Search for multiple texts (['pdf', 'the']) across all pages (case insensitive)
47+
viewer.textSearchModule.findTextAsync(['pdf', 'the'], false).then(function (res) {
48+
console.log(res); // Log the search results
49+
});
50+
}
51+
</script>
52+
53+
{% endhighlight %}
54+
{% endtabs %}
55+
56+
### Description:
57+
58+
The `findTextAsync` method is designed for performing an asynchronous text search within a PDF document. You can use it to search for a single string or multiple strings, with the ability to control case sensitivity. By default, the search is applied to all pages of the document. However, you can adjust this behavior by specifying the page number (pageIndex), which allows you to search only a specific page if needed.
59+
60+
### Parameters:
61+
62+
**text (string | string[]):**
63+
64+
The text or an array of texts you want to search for in the document.
65+
66+
**matchCase (boolean):**
67+
68+
Indicates whether the search should be case-sensitive.
69+
When set to true, the search will match the exact case.
70+
When set to false, the search will ignore case differences.
71+
72+
**pageIndex (optional, number):**
73+
74+
Specifies the page number (zero-based index) to search within the document.
75+
If not provided, the search will be performed across all pages in the document.
76+
For example, passing 0 would search only the first page of the document.
77+
78+
### Example Workflow:
79+
80+
**findTextAsync('pdf', false):**
81+
This will search for the term "pdf" in a case-insensitive manner across all pages of the document.
82+
83+
**findTextAsync(['pdf', 'the'], false):**
84+
This will search for the terms "pdf" and "the" in a case-insensitive manner across all pages of the document.
85+
86+
**findTextAsync('pdf', false, 0):**
87+
This will search for the term "pdf" in a case-insensitive manner only on the first page (page 0).
88+
89+
**findTextAsync(['pdf', 'the'], false, 1):**
90+
This will search for the terms "pdf" and "the" in a case-insensitive manner only on the second page (page 1).
91+
92+
[View sample in GitHub](https://github.com/SyncfusionExamples/mvc-pdf-viewer-examples/tree/master/How%20to)
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
---
2+
layout: post
3+
title: extractTextCompleted Event in Syncfusion ##Platform_Name## Pdfviewer Component
4+
description: Learn here all about extractTextCompleted Event in Syncfusion ##Platform_Name## Pdfviewer component of Syncfusion Essential JS 2 and more.
5+
platform: ej2-asp-core-mvc
6+
control: Extract Text Completed
7+
publishingplatform: ##Platform_Name##
8+
documentation: ug
9+
---
10+
11+
# Extract text using extractTextCompleted Event
12+
13+
The PDF Viewer library allows you to extract the text from a page along with the bounds. Text extraction can be done using the **isExtractText** property and [**extractTextCompleted**](https://help.syncfusion.com/cr/aspnetcore-js2/Syncfusion.EJ2.PdfViewer.PdfViewer.html#Syncfusion_EJ2_PdfViewer_PdfViewer_ExtractTextCompleted) event. `extractTextCompleted` event Triggers when an text extraction is completed in the PDF Viewer.
14+
15+
Here is an example of how you can use the extractTextCompleted event:
16+
17+
{% tabs %}
18+
{% highlight cshtml tabtitle="Standalone" %}
19+
20+
page "{handler?}"
21+
@model IndexModel
22+
@{
23+
ViewData["Title"] = "Home page";
24+
}
25+
26+
<div class="text-center">
27+
<ejs-pdfviewer id="pdfviewer" style="height:600px" resourceUrl="https://cdn.syncfusion.com/ej2/29.1.33/dist/ej2-pdfviewer-lib" documentPath="https://cdn.syncfusion.com/content/pdf/pdf-succinctly.pdf" isExtractText=true>
28+
</ejs-pdfviewer>
29+
</div>
30+
31+
<script type="text/javascript">
32+
document.addEventListener('DOMContentLoaded', function () {
33+
var viewer = document.getElementById('pdfviewer').ej2_instances[0];
34+
viewer.isExtractText = true;
35+
viewer.extractTextCompleted = args => {
36+
//Extract the Complete text of load document
37+
console.log(args);
38+
console.log(args.documentTextCollection[1]);
39+
//Extract the Text data.
40+
console.log(args.documentTextCollection[1][1].TextData);
41+
//Extract Text in the Page.
42+
console.log(args.documentTextCollection[1][1].PageText);
43+
//Extracts the first text of the PDF document along with its bounds
44+
console.log(args.documentTextCollection[1][1].TextData[0].Bounds);
45+
};
46+
});
47+
</script>
48+
49+
{% endhighlight %}
50+
{% endtabs %}
51+
52+
[View sample in GitHub](https://github.com/SyncfusionExamples/asp-core-pdf-viewer-examples/tree/master/How%20to)

0 commit comments

Comments
 (0)