Kendra, a self built enterprise search engine, is very simple

Time:2020-10-25

Kendra, a self built enterprise search engine, is very simple

Facing the vast amount of data on the Internet, how to quickly find the information you need? Search Engines! Enter the keywords, press enter, and mass results are waiting for you to review.

But in the enterprise environment, facing a variety of IT systems, document libraries, and other kinds of different data sources, how to quickly find the required content? Just search. But suppose that the information you need is distributed across several or even dozens of different applications or systems. What better way is there to search in these places (provided that the systems support search)?

If there is a service, just like the search engine on the Internet, it can create indexes for all kinds of data sources within the enterprise, and let us search all kinds of data sources in the intranet comprehensively through a search operation in one location, and present the results together in a unified way!

Amazon Kendra, please find out

Amazon Kendra is an easy-to-use enterprise search service that helps us add search capabilities to our applications, ensuring that end users can easily find information from different data sources stored within the enterprise (including receipts, business documents, technical manuals, sales reports, internal company glossary, internal website, etc.). In addition to searching for internal data, we can alsoAmazon Simple Storage Service(Amazon S3)And onedrive, and even support applications such as salesforce, SharePoint and service now, andAmazon Relational Database Service(Amazon RDS)And the relational database to perform the search.

When entering query keywords, the service uses machine learning (ML) algorithms to understand the context and returns the most relevant results, which can contain accurate answers or complete documents. More importantly, the service can be operated without any machine learning experience. Amazon Kendra also provides you with code that can easily integrate with new or existing applications.

This article describes how to use Amazon Kendra to create an enterprise internal search system to build a solution to create and query your own search index. In the example in this article, we will use the Amazon.com Help documents are used as data sources, but Amazon Kendra also supports Microsoft Office (. Doc,. PPT, etc.), PDF, and many other text formats.

Solution overview

This article will show you how to use Amazon Kendra to create enterprise search engine on AWS. You can configure the New Amazon Kendra index in an hour without having to master deep technology or rich machine learning experience.

This article also demonstrates how to configure a customized Amazon Kendra experience by adding answers to frequently asked questions to deploy Amazon Kendra and synchronize data sources in a custom application. These questions will be explained and answered in detail later.

precondition

In this round of drilling, the following preparations should be made:

Create and configure document libraries

You need to upload the document to the S3 bucket before you can create an index in Amazon Kendra. This section describes how to create an S3 bucket, then get the file and load it into the bucket. After completing all the steps in this section, you have a data source that Amazon Kendra can use.

  1. stayAWS management consoleSelect US East (N. Virginia) or other regions where you want Amazon Kendra to run (please ensure that Kendra services are available in this region).
  2. Select services.
  3. Under storage, select S3.
  4. On the Amazon S3 console, select Create bucket.
  5. The following information is provided in general configuration:
  • Bucket name:kendrapost-{your account id}
  • Region: select the same region used to deploy Amazon Kendra indexes (this article uses us-east-1 in the eastern United States, the Northern Virginia region).
  1. Use the default value directly under bucket settings for block public access.
  2. Use the default values directly under advanced settings.
  3. Select Create bucket.
  4. downloadamazon_help_docs.zipAnd unzip the file.
  5. On the Amazon S3 console, select the bucket you just created, and then select upload.
  6. Upload the extracted file.

At this point, you should see two folders in the bucket: Amazon_ help_ Doc (containing 3100 objects) and FAQs (containing 1 object).

The following screen capture shows Amazon_ help_ Contents of doc file:
Kendra, a self built enterprise search engine, is very simple

The following screen capture shows the contents of the FAQs file:
Kendra, a self built enterprise search engine, is very simple

Create index

Index is a component of Amazon Kendra that provides search results for documents and frequently asked questions. After completing all the steps in this section, we will be able to use the index to search for documents from different data sources. For more details about indexing, seeIndexes

To create the first Amazon Kendra index, complete the following steps:

  1. On the console, select services.
  2. Under machine learning, select Amazon Kendra.

Kendra, a self built enterprise search engine, is very simple

  1. On the Amazon Kendra home page, select Create an index.

Kendra, a self built enterprise search engine, is very simple

  1. In the index name of the index details section, enter Kendra blog index.
  2. In the description section, enter my first Kendra index.
  3. In the Iam role section, select Create a new role.
  4. In the role name section, enter – index role (the role name should be prefixed with Amazon Kendra your region).
  5. In the encryption section, do not select use an AWS KMW managed encryption key. (by default, our data will be encrypted using a key owned by Amazon Kendra. )
  6. Select next.

Kendra, a self built enterprise search engine, is very simple

For more details on the Iam roles created by Amazon Kendra, see prerequisites.

Amazon Kendra offers two versions. Kendra enterprise version mainly provides high availability services for production workload, while Kendra developer version is suitable for building concept validation and experimentation. This article will use the developer version.

  1. In the provisioning editions section, select Developer Edition.

Select Create.
Kendra, a self built enterprise search engine, is very simple

For more details on the free tier, document size limits, and overall storage space for each Amazon Kendra version, seeAmazon Kendra billing standards

The index creation process can take up to 30 minutes. After the creation, we will see a message at the top of the page that the index has been successfully created.
Kendra, a self built enterprise search engine, is very simple

add data source

A data source is the location where documents are stored for indexing. We can automatically synchronize the data source with the Amazon Kendra index to ensure that the search can correctly reflect new, updated or deleted documents in the source repository.

After completing all the steps in this section, we will have a data source linked to Amazon Kendra. For more details, seeAdd document from data source

Before proceeding to the next step, ensure that the index has been created and the index status is displayed as active.

  1. On the Kendra blog index page, select add data sources.

Kendra, a self built enterprise search engine, is very simple

Amazon Kendra supports six data source types: Amazon S3, SharePoint online, servicenow, onedrive, salesforce online and Amazon RDS. Let’s take Amazon S3 as an example.

  1. Under Amazon S3, select Add connector.

Kendra, a self built enterprise search engine, is very simple

For more information about the various data sources supported by Amazon Kendra, seeAdd document from data source

  1. In the define attributes section, in the data source name location, enter Amazon_ help_ docs。
  2. In the description section, enter AWS services documentation.
  3. Select next.

Kendra, a self built enterprise search engine, is very simple

  1. In enter the data source location in the configure settings section, enter the S3 bucket you just created: kendrapost – {your account ID}.
  2. Keep the metadata files prefix folder location.

By default, metadata files are stored in the same directory as documents. If you want to place these files in other folders, you can do so by adding a prefix. For more details, seeS3 document metadata

  1. In the select encryption key section, cancel all check boxes.
  2. In the role name section, enter source role (role prefixed with Amazon Kendra -).
  3. In the additional configuration section, you can add patterns to include or exclude certain folders or files. In the example in this article, keep the default values directly.

Kendra, a self built enterprise search engine, is very simple

  1. In the frequency section, select run on demand. This step defines the frequency of synchronization between the data source and the Amazon Kendra index. For this walkthrough, you can perform synchronization manually (only once).
  2. Select next.

Kendra, a self built enterprise search engine, is very simple

  1. On the review and create page, select Create.

Kendra, a self built enterprise search engine, is very simple

  1. After the data source is created, select sync now to synchronize the document with the Amazon Kendra index.

Kendra, a self built enterprise search engine, is very simple

The duration of the entire synchronization process depends on the number of documents indexed. In this use case, it may take 15 minutes, after which you should see a message that the synchronization was successful.
Kendra, a self built enterprise search engine, is very simple

In the sync run history section, you can see that 3099 documents have been synchronized.

Use the search console to browse the search index

The purpose of this section is to browse the available search queries through the built-in Amazon Kendra console.

To search a previously created index, complete the following steps:

  1. Under indexes, select Kendra blog index.

Kendra, a self built enterprise search engine, is very simple

  1. Select search console.

Kendra, a self built enterprise search engine, is very simple

Kendra is able to answer three types of questions: fact, descriptive and keyword questions. For more details, seeAmazon Kendra FAQ。 We can use the previous upload Amazon.com Help documents ask questions.

In the search field, enter: what is Amazon music unlimited?
Kendra, a self built enterprise search engine, is very simple

For such a factual question (who, what, when, where), Amazon Kendra can quickly answer and provide a link to the source document.

Enter shipping rates to Canada in the keyword search. The answer for Amazon Kendra is shown in the screenshot below.
Kendra, a self built enterprise search engine, is very simple

Add FAQ

You can also upload a list of frequently asked questions to provide direct answers to frequently asked questions by end users. To do this, we need to load the corresponding. CSV file, which contains information about the problem. This section describes how to create and configure this file and load it into Amazon Kendra.

  1. On the Amazon Kendra console, navigate to the index.
  2. Under data management, select FAQs.

Kendra, a self built enterprise search engine, is very simple

  1. Select Add FAQ.

Kendra, a self built enterprise search engine, is very simple

  1. In the FAQ name of the define FAQ project section, enter Kendra post FAQ.
  2. In the description section, enter my first FAQ list.

Kendra, a self built enterprise search engine, is very simple

Amazon Kendra is able to accept. CSV files with each line beginning with a question and ending with an answer. See the following table for details:
Kendra, a self built enterprise search engine, is very simple

Here’s a look at the. CSV file format used in this example:

"How do I sign up for the Amazon Prime free Trial?"," To sign up for the Amazon Prime free trial, your account must have a current, valid credit card. Payment options such as an Amazon.com Corporate Line of Credit, checking accounts, pre-paid credit cards, or gift cards cannot be used. "," https://www.amazon.com/gp/help/customer/display.html/ref=hp_left_v4_sib?ie=UTF8&nodeId=201910190”
  1. In the S3 section under FAQ settings, enter S3: / / kendrapost – {your account ID} / FAQs/ kendrapost.csv 。
  2. In the Iam role section, select Create a new role.
  3. In the role name section, enter FAQs role (the role name should be prefixed with Amazon Kendra your region).

Kendra, a self built enterprise search engine, is very simple

  1. Select Add.
  2. Wait a moment until the status is active.

Kendra, a self built enterprise search engine, is very simple
Now, we should be able to check on the search console to see if the FAQ works.

  1. Under indexes, select our index.
  2. Under data management, select search console.

Kendra, a self built enterprise search engine, is very simple

  1. In the search field, enter how do I sign up for the Amazon Prime free trial?
  2. Add Kendra to the list of frequently asked questions before uploading the results to Amazon.

Kendra, a self built enterprise search engine, is very simple

Using Amazon Kendra in your own applications

We can add the following components to the application through the search console:

L main search page: the home page containing all components. Here, you can integrate your application with the Amazon Kendra API.

L search bar: a component in which you can enter search terms and call the search function.

L results: a component that displays Amazon Kendra results. It contains three parts: best answer, FAQ results and suggestion document.

L pagination: a component for pagination of Amazon Kendra response results.

Amazon Kendra also provides source code that can be deployed on the site. This function is based on the modified MIT license, so we can apply it directly or modify it according to the actual needs.

This section focuses on how to deploy Amazon Kendra search mechanism into our own website. We will use the Node.js This use case is based on the MacOS environment.

To run this demonstration, you need to prepare the following components:

l Npm (Node.js)

l Iam voucher, used by users with appropriate permissions to access Amazon Kendra

L Downloadamazon_aws-kendra-sample-app-master.zipFile and extract it.

The operation steps are as follows:

  1. Open the terminal window and go to the AWS Kendra sample app master folder:

cd /{folder path}/aws-kendra-sample-app-master

  1. For env.development.local The. Example file creates a file named env.development.local Copy of:

cp .env.development.local.example .env.development.local

  1. Editor env.development.local File and add the following connection parameters:

l REACT_ APP_ Index: Amazon Kendra INDEX ID (which can be found on the index home page).

l REACT_ APP_ AWS_ ACCESS_ KEY_ ID: account access key.

l REACT_ APP_ AWS_ SECRET_ ACCESS_ Key: account secret access key.

l REACT_ APP_ AWS_ SESSION_ Token: in this use case, leave this blank.

l REACT_ APP_ AWS_ DEFAULT_ Region: the region used to deploy Kendra indexes (for example, us-east-1).

  1. Save all changes.
  2. install Node.js Dependencies:

npm install

  1. Start the local development server:

npm start

  1. adopthttp://localhost: 3000 / view demo app. You should see the following.

Kendra, a self built enterprise search engine, is very simple

  1. Enter the same question as the test FAQ before: how do I sign up for the Amazon Prime free trial?

The screen capture below shows that even if the demo web page runs locally on the computer, the results are exactly the same as what we got from the Amazon Kendra console.
Kendra, a self built enterprise search engine, is very simple

Resource clearance

To avoid unnecessary costs from unused roles and policies, delete the previously created resources: Amazon Kendra index, S3 bucket, and corresponding Iam roles.

  1. To delete the Amazon Kendra index, under indexes, select Kendra blog index.

Kendra, a self built enterprise search engine, is very simple

  1. In the index settings section, select Delete from the actions drop-down menu.

Kendra, a self built enterprise search engine, is very simple

  1. In the index settings section, select Delete from the actions drop-down menu;Kendra, a self built enterprise search engine, is very simple

Wait until you receive a message confirming the deletion; the entire process can take up to 15 minutes.Kendra, a self built enterprise search engine, is very simple

For instructions on deleting S3 buckets, seeHow to delete S3 buckets?

summary

This article describes how to use Amazon Kendra to deploy enterprise search services. You can use Amazon Kendra, which is supported by machine learning, to improve the search experience within the company, and use natural language to quickly retrieve documents without any machine learning / AI experience.

For more details about the Amazon Kendra project, seeAWS re: keynote speech by Andy Jassy at invent 2019Amazon Kendra FAQas well asWhat is Amazon Kendra

Kendra, a self built enterprise search engine, is very simple