この記事では、PDFドキュメントからテーブルを抽出^{(extract tables from PDF documents)}する方法を説明します。別々に使用したい複数のテーブルを含む多くのPDFファイルがあるかもしれません。これらのテーブルをコピーして貼り付けると、期待どおりの出力が得られない可能性があるため、適切なオプションではありません。したがって、 ^(Copying)PDFファイルからテーブルを抽出し、それらのテーブルを個別のファイルとして保存できる他の簡単なオプションが必要です。

これらのPDFテーブル抽出ツールのほとんどは、 ^{(PDF table extractor tools)}PDFテーブルがスキャンされた場合には役に立ちません。このような場合は、最初にPDFを検索可能^{(make the PDF searchable)} にしてから、これらのオプションを試してください。

PDFドキュメントからテーブルを抽出する

この投稿では、PDFファイルからテーブルを抽出するための2つの無料オンラインサービスと3つの無料ソフトウェアを追加しました。

PDFからXLSへ
PDFtoExcel.com
タブラ
ByteScoutPDFマルチツール
SejdaPDFデスクトップ。

1]PDFからXLS

PDFからXLSサービス

PDF to XLSは、 ^(XLS)PDFからテーブルを抽出するための最良のオプションの1つです。便利な2つの機能があります。20個のPDF^{(20 PDF)}ドキュメントから一緒にテーブルをフェッチできます。また、PDFテーブルの抽出は自動的に行われます。XLSXファイルとして出力を生成します。PDFに複数のテーブルがある場合、各テーブルは出力XLSXファイルの異なるシートに個別に保存されます。

^{(Open the homepage)}このサービスのホームページを開きます。その後、PDFファイルをドラッグアンドドロップするか、 [ファイルのアップロード^{(UPLOAD FILES)}]ボタンを使用します。アップロードされた各PDFは、 XLSX^(PDF)形式のファイルに自動的に変換されます。出力ファイルの準備ができたら、それらを1つずつダウンロードするか、すべての出力ファイルを含むZIPファイルをダウンロードできます。^(ZIP)

2] PDFtoExcel.com

PDFドキュメントからテーブルを抽出する

PDFtoExcel.comサービスは、一度に1つのPDFからテーブルを抽出できますが、 ^(PDF)PDFをアップロードするための複数のプラットフォームをサポートしています。PDFをアップロードするために、 OneDrive、デスクトップ^(desktop)、Googleドライブ^{(Google Drive)}、およびDropboxプラットフォームをサポートします。また、変換プロセスは自動です。

このサービスのホームページはこちら^(here)です。そこで、アップロードオプションを選択してPDFを追加します。その後、PDFを自動的にアップロードしてExcel（XLSX）ファイルに変換します。出力の準備ができると、 PDF^(PDF)テーブルを含む出力ファイルを保存するためのダウンロードリンクが表示されます。

注：^{(Note: )}このサービスでは、スキャンしたPDFファイルからテーブルを抽出することもできると記載されていますが、私には機能しませんでした。スキャンしたPDF^(PDF)でも試すことができます。

3]タブラ

TabulaPDF抽出ソフトウェア

Tabulaは、 ^(Tabula)PDFに存在するテーブルを自動的に検出し、それらのテーブルをTSV、JSON、またはCSVファイルとして保存できる強力なソフトウェアです。PDFテーブルごとに個別のCSVファイルを保存するオプション、またはすべてのテーブルを1つのCSVファイルに保存するオプションを選択できます。

このオープンソースの^{(open-source)}PDFテーブルエクストラクタをダウンロードするには、ここをクリックしてください^{(click here)}。また、Java^{(requires Java)}を実行して正常に使用する必要があります。

ダウンロードしたZIP^(ZIP)ファイルを解凍し、tabula.exeファイルを実行します。デフォルトのブラウザでページが開きます。ページが開いていない場合は、ブラウザにhttp：// localhost：8080を追加し、Enterキー^(Enter)を押します。

これで、[参照]オプションを使用して^(Browse)PDFを追加できるインターフェイスが表示されます。その後、インポート^(Import)ボタンを押します。PDFを追加すると、そのインターフェイスにPDFページが表示され^(PDF)ます。

[テーブル^{(Autodetect Tables)}の自動検出]ボタンを使用すると、そのPDFに存在するすべてのテーブルが自動的に強調表示されます。特定のテーブルを選択して、テーブルを手動で強調表示することもできます。必要に応じて、選択したテーブル^{(remove selected tables)}を削除することもできます。

これは、必要なテーブルのみを保存するのに役立ちます。PDFテーブルが強調表示されたら、[抽出されたデータのプレビューとエクスポート^{(Preview & Export Extracted Data)}]ボタンをクリックします。

最後に、上部にあるドロップダウンメニューを使用して出力形式を選択し、[エクスポート^(Export)]ボタンを押します。これにより、 PDFテーブルが選択した出力形式のファイルに保存されます。

4]ByteScoutPDFマルチツール

ByteScoutPDFマルチツール-PDFを追加してテーブルを検出

名前が示すように、このソフトウェアには複数のツールが付属しています。PDFをマルチページTIFF^{(convert PDF to multipage TIFF)}に変換する、PDFドキュメントを回転^{(rotate PDF document)}する、PDFを検索不能^{(make PDF unsearchable)}にする、PDFを最適化^{(optimize PDF)}する、PDFに画像を追加する^{(add an image to PDF)}などのツールがあります。PDFテーブル検出機能もあります。これはかなり素晴らしいです。このツールの利点は、スキャンしたPDFからもテーブルを抽出^{(extract tables from scanned PDF)}できることです。複数のページのテーブルを検出し、それらのテーブルをCSV、XLS、XML、TXT、またはJSON形式のファイルとして抽出できます。抽出する前に、ページ範囲^{(page range)}を設定することもできます指定されたページからのみテーブルを抽出します。

ここで^(here)このソフトウェアを入手できます。非営利目的でのみ無料^{(free for non-commercial use)}です。インストール後、このソフトウェアを実行し、[ドキュメントを開く]オプションを使用して^{(Open Document)}PDFを追加します。その後、上の画像で強調表示されているように、テーブルの検出^{(Detect tables)}ツールをクリックします。このツールは、データ抽出^{(Data Extraction)}カテゴリにあります。

テーブルを検出するための条件を設定できるボックスが開きます。たとえば、列、行の最小数、テーブル間の最小改行を設定したり、テーブル検出モードを境界付きまたは境界なしのテーブルに設定したりできます。オプションを使用するか、デフォルト設定を維持します。

その後、そのボックスの[次のテーブルの検出^{(Detect next table)}]ボタンを押します。現在のページのテーブルを識別して選択します。このようにして、別のページに移動して、より多くのテーブルを検出できます。

テーブルを検出し、選択した出力でPDFテーブルを保存します

完了したら、[抽出に進む^{(Proceed to extraction)}]ボタンを使用して、出力形式を選択します。最後に、オプションを使用して、現在のページからテーブルを保存するか、ページ範囲を定義して、出力を保存できます。

このツールは満足のいく出力を提供します。ただし、 PDF^(PDF)内の他のコンテンツを検出し、複数のページからテーブルを抽出できない場合があります。その場合は、テーブルを1つずつフェッチして保存するために使用する必要があります。

5]SejdaPDFデスクトップ

Sejda PDFデスクトップ、PDFからExcelへのコンバーター

SejdaPDFデスクトップ^{(Sejda PDF Desktop)}も多目的ソフトウェアです。PDFの最適化または圧縮、PDF^{(compress PDF)}への透かしの追加、PDFからの制限の削除、 ^{(remove restrictions from PDF)}PDFドキュメントの編集などが可能です。ただし、無料プランには制限があります。無料プランでは、1日あたり3つのタスクしか実行できません。また、PDFのサイズ制限は50MBまたは10ページ^{(10 pages)}です。

PDFからExcel^{(PDF to Excel)}への変換ツールを使用してPDFテーブルを抽出できます。PDFページのテーブルを自動的に検出し、それらのテーブルをXLSXまたはCSVとして保存できます。

そのダウンロードリンクはこちら^(here)です。インストール後、メインインターフェイスからPDFtoExcel^(PDF)ツールを使用します。^(Excel)そのツールを選択した後、 [ PDFファイルの選択^{(Choose PDF files)}]ボタンを使用します。無料プランに追加できるPDFは1つだけです。

PDFを追加すると、[ PDFをCSV^{(Convert PDF to CSV)}に変換]ボタンと[PDFをExcel^{(Convert PDF to Excel)}に変換]ボタンが表示されます。ボタンを使用すると、出力をPCの目的の場所に保存できます。

PDFテーブルをExcelまたはcsvに変換します

そのPDFテーブル検出ツールは優れています。テーブルを手動で検出する必要はありません。それでも、他のテキストコンテンツをPDFテーブルとして含め、出力に保存する場合があります。しかし、全体的な結果は良好です。

それで全部です。

これらは、PDFからテーブルを抽出するためのいくつかの優れたツールです。Tabulaソフトウェアは、他のツールよりも効果的です。それでも、すべてのツールを試して、どれが役立つかを確認できます。

同様の読み取り：^{(Similar reads:)}

PDFから添付ファイルを抽出する^{(Extract attachments from PDF)}
PDFからハイライトされたテキストを抽出します^{(Extract highlighted text from PDF)}。

How to extract Tables from PDF documents

This article will show you hоw to extract tables from PDF documents. You might have many PDF files that contain multiple tables that you want to use separately. Copying and pasting those tables is not a good option as it may not give the expected output, therefore you need some other simple options that can extract tables from a PDF file and save those tables as separate files.

Most of these PDF table extractor tools can’t help if the PDF table is scanned. In such a case, you should first make the PDF searchable and then try these options.

Extract Tables from PDF documents

In this post, we have added 2 free online services and 3 free software to extract tables from a PDF file:

PDF to XLS
PDFtoExcel.com
Tabula
ByteScout PDF Multitool
Sejda PDF Desktop.

1] PDF to XLS

PDF to XLS service

PDF to XLS is one of the best options for extracting tables from PDF. It has two features that make it handy. You can fetch tables from 20 PDF documents together. Also, the PDF table extraction is automatic. It generates the output as an XLSX file. If a PDF has multiple tables, then each table is stored separately in different sheets of output XLSX file.

Open the homepage of this service. After that, drag n drop PDF files or use UPLOAD FILES button. Each uploaded PDF is converted to XLSX format file automatically. When the output files are ready, you can download them one by one or download a ZIP file that will contain all the output files.

2] PDFtoExcel.com

Extract Tables from PDF documents

PDFtoExcel.com service can extract tables from one PDF at once but it supports multiple platforms to upload PDF. It supports OneDrive, desktop, Google Drive, and Dropbox platforms to upload a PDF. Also, the conversion process is automatic.

This service homepage is here. There, select an upload option to add PDF. After that, it automatically uploads and converts PDF to Excel (XLSX) file. When the output is ready, you will get the download link to save the output file containing PDF table(s).

Note: Though this service mentions that it can extract tables from scanned PDF files also, it didn’t work for me. You can still try it for scanned PDF.

3] Tabula

Tabula PDF extractor software

Tabula is a powerful software that can automatically detect tables present in a PDF and then lets you save those tables as TSV, JSON, or CSV file. You can select the option to save separate CSV files for each PDF table or save all the tables in a single CSV file.

To download this open-source PDF table extractor, click here. It also requires Java to run and use it successfully.

Extract the ZIP file that you downloaded, and run tabula.exe file. It will open a page in your default browser. If the page is not opened, then add http://localhost:8080 in your browser and press Enter.

Now you will see its interface where you can use the Browse option to add a PDF. After that, press Import button. When the PDF is added, you can see PDF pages on its interface.

Use Autodetect Tables button and it will highlight all the tables present in that PDF automatically. You can also manually highlight a table by selecting a particular table. If you want, you can also remove selected tables of your choice.

This will help you save only those tables that you want. When PDF tables are highlighted, click on the Preview & Export Extracted Data button.

Finally, use the drop-down menu available on the top part to select an output format, and press Export button. This will save PDF tables in the output format file selected by you.

4] ByteScout PDF Multitool

ByteScout PDF Multitool- add pdf and detect tables

As the name suggests, this software comes with multiple tools. It has tools such as convert PDF to multipage TIFF, rotate PDF document, make PDF unsearchable, optimize PDF, add an image to PDF, and more. PDF table detector feature is also there which is pretty awesome. The advantage of this tool is you can extract tables from scanned PDF too. You can detect tables in multiple pages and then extract those tables as CSV, XLS, XML, TXT, or JSON format file. Before extraction, it also lets you set a page range to extract tables from specified pages only.

You can grab this software here. It is free for non-commercial use only. After installation, run this software and use Open Document option to add a PDF. After that, click on the Detect tables tool as highlighted in the image above. This tool is present under Data Extraction category.

It will open a box where you can set conditions to detect tables. For example, you can set a minimum number of columns, rows, minimum line breaks between tables, set table detection mode to bordered or borderless table, etc. Use options or keep the default settings.

After that, press Detect next table button in that box. It will identify and select a table on the current page. In that way, you can move to another page and detect more tables.

detect tables and save pdf tables with selected output

When you’re done, use Proceed to extraction button, and select the output format. Finally, you can use options to save the tables from the current page or define a page range, and save the output.

The tool gives a satisfactory output. But sometimes, it might detect other content in PDF, and might not be able to extract tables from multiple pages. In that case, you should use it to fetch and save tables one by one.

5] Sejda PDF Desktop

Sejda PDF Desktop with pdf to excel converter

Sejda PDF Desktop is also a multipurpose software. It can optimize or compress PDF, add watermark to PDF, remove restrictions from PDF, edit PDF document, etc. However, its free plan has limitations. In the free plan, only 3 tasks per day can be done. Also, the PDF size limit is 50 MB or 10 pages.

You can use its PDF to Excel conversion tool to extract PDF tables. It automatically detects the tables in PDF pages and lets you save those tables as XLSX or CSV.

Its download link is here. After installation, use PDF to Excel tool from its main interface. After selecting that tool, use Choose PDF files button. Only one PDF can be added to the free plan.

When the PDF is added, it will provide Convert PDF to CSV and Convert PDF to Excel buttons. Use a button and then you can save the output to the desired location on your PC.

convert pdf table to excel or csv

Its PDF table detection tool is good. You don’t have to manually detect tables. Still, sometimes it might include other text content as a PDF table and store it in the output. But overall results are good.

That’s all.

These are some good tools to extract tables from PDF. Tabula software is more effective than other tools. Still, you can try all the tools and check which helps.

Similar reads:

拓真山田

About the author

私はユタ大学でコンピュータエンジニアリングを卒業し、ソフトウェア開発と Windows 開発で 10 年以上の経験があります。PDF や Office ドキュメントを扱った経験があり、iOS や Android プラットフォームを使用してガジェットを作成した経験もあります。

PDFドキュメントからテーブルを抽出する方法

PDFドキュメントからテーブルを抽出する

1]PDFからXLS

2] PDFtoExcel.com

3]タブラ

4]ByteScoutPDFマルチツール

5]SejdaPDFデスクトップ

How to extract Tables from PDF documents

Extract Tables from PDF documents

1] PDF to XLS

2] PDFtoExcel.com

3] Tabula

4] ByteScout PDF Multitool

5] Sejda PDF Desktop

拓真山田

About the author

Related posts

Document Converter、PDF、DOCX、DOCX、RTF、TXT、HTMLファイル

PPS fileとは何ですか？ PPSからPDFをWindows 11/10に変換する方法

PDF text Windows 10でファイルを編集または保存するときに消えます

クラウドベースのBest Free PDF Editor Online Tools

PDFファイルを編集するための無料PDF Editor Online Tool - PDF Yeah

Compress PDF Software：Compress PDF PDF Reducer onlineツールを使用したファイル

Optimize、Windows 10でPDF File sizeを圧縮し縮小する

検索可能なPDFへのConvert Scanned PDFの方法？

Windows 10のためのCutePDFを使って文書をPDFに変換する方法

7-PDF Website Converter：PDFからConvert Web Pages

Remove PDF Restrictions無料software or online toolsを使用する

Windows 10のPDF fileサイズを縮小するためのPDF Compressor

ブラウザを使用してGoogle Docsを使用してDocumentsをPDFに変換する方法

PRIMA Rapid Image Viewerを介して画像とPDF Docsを素早く表示します

Encrypt PDF file Windows PCの場合はPDF Page Lockを使用してください

PDF documentをWindowsのマルチキャリアTIFF imageに変換する方法

Edge browserのPDFs for PDFsの2ページLayoutを有効にする方法

Best PDF word counter software and online tools

PDF Fixer ToolのPDF Fixer ToolのPDFを修理する方法Windows 10

Google Driveを使用してExcel fileをPDF onlineに変換する方法

PDFドキュメントからテーブルを抽出する方法

PDFドキュメントからテーブルを抽出する

1]PDFからXLS

2] PDFtoExcel.com

3]タブラ

4]ByteScoutPDFマルチツール

5]SejdaPDFデスクトップ

How to extract Tables from PDF documents

Extract Tables from PDF documents

1] PDF to XLS

2] PDFtoExcel.com

3] Tabula

4] ByteScout PDF Multitool

5] Sejda PDF Desktop

拓真 山田

About the author

Related posts

Document Converter、PDF、DOCX、DOCX、RTF、TXT、HTMLファイル

PPS fileとは何ですか？ PPSからPDFをWindows 11/10に変換する方法

PDF text Windows 10でファイルを編集または保存するときに消えます

クラウドベースのBest Free PDF Editor Online Tools

PDFファイルを編集するための無料PDF Editor Online Tool - PDF Yeah

Compress PDF Software：Compress PDF PDF Reducer onlineツールを使用したファイル

Optimize、Windows 10でPDF File sizeを圧縮し縮小する

検索可能なPDFへのConvert Scanned PDFの方法？

Windows 10のためのCutePDFを使って文書をPDFに変換する方法

7-PDF Website Converter：PDFからConvert Web Pages

Remove PDF Restrictions無料software or online toolsを使用する

Windows 10のPDF fileサイズを縮小するためのPDF Compressor

ブラウザを使用してGoogle Docsを使用してDocumentsをPDFに変換する方法

PRIMA Rapid Image Viewerを介して画像とPDF Docsを素早く表示します

Encrypt PDF file Windows PCの場合はPDF Page Lockを使用してください

PDF documentをWindowsのマルチキャリアTIFF imageに変換する方法

Edge browserのPDFs for PDFsの2ページLayoutを有効にする方法

Best PDF word counter software and online tools

PDF Fixer ToolのPDF Fixer ToolのPDFを修理する方法Windows 10

Google Driveを使用してExcel fileをPDF onlineに変換する方法

拓真山田