Query page contents with SharePoint and M365 Search

  6 mins read  

Every time I look at the search capabilities of SharePoint and Microsoft 365, there’s no doubt they’ve come a long way since the early days. There is also an expectation that search “just works” - predominantly because of the extraordinary capability of Google’s product and the perception that it picks up all online content, with no intervention. There is an SEO industry that specialises in making publicly available content successfully searchable by Google.

SharePoint’s search capability is highly customisable. Content metadata drives a great part of the custom capabilities of SharePoint search, and from what we’ve seen so far in the integration of SharePoint search in to the Microsoft 365 search roadmap, the ability to continue using customised search based on content metadata is not going away.

Searching page contents

Sometimes when you have a custom search experience, there is a need to be quite specific about which parts of the content is searched, and which parts are not. Take, for example, an experience where the content of a page is important, and there is metadata with similar keywords present, that shouldn’t actually be searched. Often I’ve developed search experiences where the user has a series of refiners, they may include department names like “human resources”, and “Information technology.” With these types of refiners, it makes for a poor user experience when a user is searching for the word “human” and the results include all content identified by the “Human Resources” refiner. It’s important to separate out the keyword search capability from the refiner capability.

When a SharePoint search is executed with just keywords in a search box, the search engine looks at the full-text index, and returns all content with relevant results. The full-text index includes the contents of all pages, both modern and classic, so a basic keyword search will return any page relevant to the search query.

The advanced search scenario described above calls for what’s known as a property restricted query. This style of search query allows the search words to be specified against a particular property, but to exclude other properties from the queried keywords. For example, a using a property restricted query, a search can be performed where the words “Human Resources” exist in the page contents, but not necessarily in a page’s metadata.

Page content is, by default, included in SharePoint’s search index. This is true for classic publishing sites, and is also true for modern Communications sites, even though the underlying metadata / column name is different for the two types of sites.

Classic pages

The classic publishing site page stores its content in a field named:

  • PublishingPageContent

The field schema of this field is as follows:

<Field ID="{F55C4D88-1F2E-4ad9-AAA8-819AF4EE7EE8}" Name="PublishingPageContent" StaticName="PublishingPageContent" SourceID="http://schemas.microsoft.com/sharepoint/v3" Group="Page Layout Columns" DisplayName="Page Content" Description="Page Content is a site column created by the Publishing feature. It is used on the Article Page Content Type as the content of the page." Type="HTML" Required="FALSE" Sealed="TRUE" RichText="TRUE" RichTextMode="FullHtml" Customization="" />

Modern pages

The modern page stores its content in a field named:

  • CanvasContent1

The field schema of this field looks like this:

<Field ID="{4966388E-6E12-4BC6-8990-5B5B66153EAE}" Name="CanvasContent1" StaticName="CanvasContent1" DisplayName="Authoring Canvas Content" Type="HTML" SourceID="http://schemas.microsoft.com/sharepoint/v3" Group="_Hidden" Description="This column stores the content of the authoring canvas in a site page." AllowDeletion="FALSE" ShowInNewForm="FALSE" ShowInEditForm="FALSE" ShowInDisplayForm="FALSE" ShowInViewForms="FALSE" ShowInListSettings="FALSE" ShowInVersionHistory="FALSE" RichText="TRUE" RichTextMode="FullHtml" Customization="" />

Crawled properties

Core to the customisation capabilities of search are two key components: crawled, and managed properties. Crawled properties are defined by the content that is crawled. Every column in the SharePoint content being crawled gets a crawled property.

When a column like the classic page’s PublishingPageContent is present, two crawled properties are automatically present in the search index:

  • ows_PublishingPageContent
  • ows_r_HTML_PublishingPageContent

… and the modern page’s CanvasContent1 column has two crawled properties in the search index:

  • ows_CanvasContent1
  • ows_r_HTML_CanvasContent1

Mapped properties

Out of the crawled properties that contain page content, only two of them are mapped to a mapped property:

Crawled property Full-text indexed Mapped property
ows_PublishingPageContent Yes (none)
ows_r_HTML_PublishingPageContent Yes PublishingPageContentOWSHTML
ows_CanvasContent1 Yes (none)
ows_r_HTML_CanvasContent1 No CanvasContent1OWSHTML

Criteria for a property restricted query against page content

To have a crawled property available in a property restricted query, it needs to fulfil three criteria. It must be:

  1. The crawled property must be included in the full text index
  2. The crawled propergy must be mapped to a mapped property
  3. The mapped property must be queryable

By default, none of the crawled > mapped property configuration matches the above criteria. Here is the same table from above with a column indicating whether the queryable mapped property is queryable:

Crawled property Full-text indexed Mapped property Queryable flag enabled
ows_PublishingPageContent Yes (none) (n/a)
ows_r_HTML_PublishingPageContent Yes PublishingPageContentOWSHTML No
ows_CanvasContent1 Yes (none) (n/a)
ows_r_HTML_CanvasContent1 No CanvasContent1OWSHTML Yes

Based on the above criteria, even though the mapped property named CanvasContent1OWSHTML is queryable, the underlying crawled property PublishingPageContentOWSHTML is not full-text-indexed, so content inside it will not be searched.

How to configure search for a property restricted query against page content

When configuring SharePoint’s search for custom scenarios, it is best to leave as much of the default configuration intact. This ensures other solutions that rely on the default configuration are not affected adversely, and your fellow M365 professionals and consultants who work on the same tenant will avoid tearing their hair out, trying to figure out why this tenant’s search behaves different to the other.

The simplest way to implement a configuration that allows for property restricted queries is to add a new mapped property, configure it as follows:

  • Queryable: Yes
  • Mapped to crawled properties: ows_PublishingPageContent, ows_CanvasContent1

Beacause the mapped property is queryable, the name of the property can be used in a search query to only search the text specified the query on that property. For example, if the property is named BodyContent, the following search query will only search the page contents for the words ‘human resources’ - no other page metadata will be searched:

BodyContent:'human resources'

It’s worth noting at this point, the approach described above doesn’t take into account the potential impact the configuration has on the performance of the search index. Analysis of the impact is somewhat masked from administrators of a Microsoft 365 environment, so it is hard to tell. My advice to customers and partners is to make the change first in a comparable test environment.