@kliemohn - nice job with this blog post, great to see so many comments.
Principal Software Engineer Kirk Liemohn shares a couple of the dirty little secrets about SharePoint 2010 Search Security Trimming and what you need to know to stay out of trouble…
Introduction – What is Security Trimming?
Security trimming is simply the act of filtering out content that should not be accessible (typically read-able) for a given user. It is a core concept within SharePoint that affects what navigation elements you see, what sites you have access to, what lists and libraries you have access to, and what list items you see. It is also a core concept within SharePoint search. Security trimming in SharePoint search comes in two flavors: indexed security trimming and query-time security trimming.
Indexed Security Trimming
Security trimming within the index is possible when the crawler can obtain Access Control Lists (ACLs) for each item and store them in the index. It is the preferred approach because it is faster. There is little that is “dirty” about this approach. This is used for SharePoint content as well as file shares and other content sources. Outside of some minor storage and crawl processing costs, the only real downside is that permission changes are not reflected until the next incremental crawl.
Query-Time Security Trimming
Sometimes you do not have ACLs available at index time and you must resort to query-time security trimmers. This is the case when crawling web sites since there is no way to ask a website “who has access to this page?” This may also be the case with Business Connectivity Services (BCS) when you are crawling content from a database or web service. BCS can use indexed security trimming, but only if you can make ACLs available through your external system.
The rest of this post focuses on query-time security trimming and how this has changed with SharePoint 2010. Most of it is straightforward, but there are a couple of little “dirty secrets” you’ll need to be aware of. First I’ll give an implementation overview, then I’ll show what I have observed, and finally I’ll focus on two gotchas that are in SharePoint 2010.
Implementing a Custom Security Trimmer
Query-time security trimming in SharePoint 2010 is very similar to SharePoint 2007. They both require the server edition of the product (no WSS 3.0 or SharePoint Foundation). In both cases you write a class that implements an interface and then you register that security trimmer with a crawl rule. For SharePoint 2007 you use the ISecurityTrimmer interface, whereas in SharePoint 2010 you use the ISecurityTrimmer2 interface. A good reference for writing security trimmers can be found in Writing a Custom SecurityTrimmer for SharePoint Server Search.
This interface is very simple. It has an Initialize() method that runs once and can provide your class information specified when registering the security trimmer. This method is identical for SharePoint 2007 and SharePoint 2010.
The only other method is a CheckAccess method. This is the more interesting of the two. This is where the query is actually happening. This method returns a BitArray describing whether the current user has access to each URL provided. It has some differences between SharePoint 2007 and SharePoint 2010; most notably that in SharePoint 2010 you are provided an IIdentity. In SharePoint 2007 you had to use WindowsIdentity.GetCurrent() or HttpContext.Current.User depending on the authentication method.
The CheckAccess method can run multiple times for a single query. What the query engine is doing is providing your security trimmer a batch of URLs expecting to find out which ones can be provided to the current user for search results. So, your security trimming code is running during the query – potentially multiple times. If the security trimmer does not give the query engine enough hits, it will provide the next batch of URLs to your security trimmer – and will continue to do so until it has what it considers enough URLs or until it has exhausted all hits for the search request.
Since the security trimmer is running during the query, it needs to be quick and efficient. However, while your security trimmer code is running you are likely doing some processing to find out if the current user has access to a URL. This may involve a web service call to another system. This may be fine, but I recommend that you test the performance thoroughly just to be sure. Unfortunately, since SharePoint 2010 only provides an IIdentity, you are fairly limited since this does not provide you as much information as what you have access to in SharePoint 2007. The context the security trimmer runs in is not the same context you are used to for your typical server-side SharePoint code, so you may have to do additional calls to lookup user information or other context-specific data. If so, you’ll want to factor that into your performance costs.
In SharePoint 2007, the security trimmer was typically run enough so that the query engine could provide a page-worth of URLs to the user running the query. Since the default query page size was 10, it may have stopped after having 20 or so URLs. In SharePoint 2010, however, it not only wants enough URLs to show a page-worth of results to the user, but it also wants enough to show appropriate refiners. By default, I believe that this means that the SharePoint 2010 query engine wants 50 results (after trimming).
Depending on how many search results there are before trimming and how many are trimmed out, your security trimmer may be called a large number of times. Using the default settings with a large result set and with the majority of search results trimmed out, my observations have shown me that the first time the security trimmer is run it is provided 50 URLs. If you trim some out, the security trimmer is immediately called again with another 75 URLs – and this is done before the user sees any search results. That continues with batches of 75 URLs until enough results are satisfied or until the total result set is exhausted. Using trace statements and DebugView I can watch what is going on. The chart below shows how many URL are requested for a single query. In this case the security trimmer is called 6 times and given a total of 425 URLs.
In the output below, the first line is when Initialize is called. Each subsequent line indicates an individual call to CheckAccess.
00000000 0.00000000  [CustomSecurityTrimmer] Initializing 00000001 0.00353188  [CustomSecurityTrimmer] Count = 50 - Total = 50. 00000002 0.05990715  [CustomSecurityTrimmer] Count = 75 - Total = 125. 00000003 0.10741841  [CustomSecurityTrimmer] Count = 75 - Total = 200. 00000004 0.16418955  [CustomSecurityTrimmer] Count = 75 - Total = 275. 00000005 0.21421386  [CustomSecurityTrimmer] Count = 75 - Total = 350. 00000006 0.26570737  [CustomSecurityTrimmer] Count = 75 - Total = 425.
The algorithm changes once the user clicks to view the second page of results. The initial count of URLs provided to the security trimmer for the second page is 60. It is 70 for the third, and then seems to max out at 76. The pattern is a little more complex, so it is probably best just to show you some data. Notice how the total resets on subsequent page requests. That is because for each page we are starting a new set of processing as far as the security trimmer is concerned.
00000000 0.00000000  [CustomSecurityTrimmer] Initializing 00000001 0.00489843  [CustomSecurityTrimmer] Count = 50 - Total = 50. 00000002 0.06208209  [CustomSecurityTrimmer] Count = 75 - Total = 125. 00000003 30.23514748  [CustomSecurityTrimmer] Count = 60 - Total = 60. (Page 2) 00000004 30.27280998  [CustomSecurityTrimmer] Count = 75 - Total = 135. 00000005 45.76335526  [CustomSecurityTrimmer] Count = 70 - Total = 70. (Page 3) 00000006 45.79971313  [CustomSecurityTrimmer] Count = 75 - Total = 145. 00000007 59.88526917  [CustomSecurityTrimmer] Count = 76 - Total = 76. (Page 4) 00000008 59.88538742  [CustomSecurityTrimmer] Count = 4 - Total = 80. 00000009 59.92379761  [CustomSecurityTrimmer] Count = 75 - Total = 155. 00000010 59.97734451  [CustomSecurityTrimmer] Count = 75 - Total = 230. 00000011 82.77108765  [CustomSecurityTrimmer] Count = 76 - Total = 76. (Page 5) 00000012 82.77122498  [CustomSecurityTrimmer] Count = 14 - Total = 90. 00000013 82.84067535  [CustomSecurityTrimmer] Count = 75 - Total = 165. 00000014 82.93080902  [CustomSecurityTrimmer] Count = 75 - Total = 240. 00000015 99.88347626  [CustomSecurityTrimmer] Count = 76 - Total = 76. (Page 6) 00000016 99.88369751  [CustomSecurityTrimmer] Count = 24 - Total = 100. 00000017 99.94698334  [CustomSecurityTrimmer] Count = 75 - Total = 175. 00000018 100.04191589  [CustomSecurityTrimmer] Count = 75 - Total = 250. 00000019 106.22407532  [CustomSecurityTrimmer] Count = 76 - Total = 76. (Page 7) 00000020 106.22446442  [CustomSecurityTrimmer] Count = 34 - Total = 110. 00000021 106.30342865  [CustomSecurityTrimmer] Count = 75 - Total = 185. 00000022 106.39843750  [CustomSecurityTrimmer] Count = 75 - Total = 260. 00000023 111.46296692  [CustomSecurityTrimmer] Count = 76 - Total = 76. (Page 8 ) 00000024 111.46327972  [CustomSecurityTrimmer] Count = 44 - Total = 120. 00000025 111.52843475  [CustomSecurityTrimmer] Count = 75 - Total = 195. 00000026 111.62356567  [CustomSecurityTrimmer] Count = 75 - Total = 270.
If we are processing too much within a page (a single request), then then you can throw a PluggableAccessCheckException. This is a way for the security trimmer to throw up its hands and basically give up. A good overview of how to use this is found on Walkthrough: Using a Custom Security Trimmer for SharePoint Server Search Results, but read further below as there is a problem with this in SharePoint 2010. With this exception you can provide a message to show the end user; it basically tells them to refine their search. This is an important part of a security trimmer because you don’t want to leave the user hanging too long. If the query lasts too long, the thread will be aborted. By default this is 90 seconds.
OK, now it’s time to dish some dirt…
Dirty Secret #1 – PluggableAccessCheckException
As mentioned above, the purpose of the PluggableAccessCheckException is to allow the security trimmer to tell the query engine to stop processing so a response can be given to the user performing the query without taking too much time. Unfortunately, with SharePoint 2010 the opposite occurs. If you throw a PluggableAccessCheckException what happens is that your security trimmer will be called with all URLs in the result set (typically in batches of 75) unless the thread is aborted before you reach the end of the result set. This is even the case if the security trimmer does not trim out any results after throwing the exception. In the output below I have set a low limit of 150 URLs after which I throw the exception.
00000000 0.00000000  [CustomSecurityTrimmer] Initializing 00000001 0.00339907  [CustomSecurityTrimmer] Count = 50 - Total = 50. 00000002 0.05012119  [CustomSecurityTrimmer] Count = 75 - Total = 125. 00000003 20.21066093  [CustomSecurityTrimmer] Count = 60 - Total = 60. (Page 2) 00000004 20.25085640  [CustomSecurityTrimmer] Count = 75 - Total = 135. 00000005 40.40690613  [CustomSecurityTrimmer] Count = 70 - Total = 70. (Page 3) 00000006 40.45227051  [CustomSecurityTrimmer] Count = 75 - Total = 145. 00000007 57.72388458  [CustomSecurityTrimmer] Count = 76 - Total = 76. (Page 4) 00000008 57.72399139  [CustomSecurityTrimmer] Count = 4 - Total = 80. 00000009 57.76459122  [CustomSecurityTrimmer] Count = 75 - Total = 155. 00000010 57.76465225  [CustomSecurityTrimmer] Exceeded Limit of 150. Count = 75 - Total = 155. Throwing PluggableAccessCheckException 00000011 57.93753815  [CustomSecurityTrimmer] Count = 75 - Total = 230. 00000012 57.93761063  [CustomSecurityTrimmer] Exceeded Limit of 150. Count = 75 - Total = 230. Throwing PluggableAccessCheckException ... (skipping 8 sets of trace statements) ... 00000029 58.56541061  [CustomSecurityTrimmer] Count = 75 - Total = 905. 00000030 58.56551361  [CustomSecurityTrimmer] Exceeded Limit of 150. Count = 75 - Total = 905. Throwing PluggableAccessCheckException 00000031 58.65005875  [CustomSecurityTrimmer] Count = 75 - Total = 980. 00000032 58.65012360  [CustomSecurityTrimmer] Exceeded Limit of 150. Count = 75 - Total = 980. Throwing PluggableAccessCheckException 00000033 58.73397064  [CustomSecurityTrimmer] Count = 47 - Total = 1027. 00000034 58.73407364  [CustomSecurityTrimmer] Exceeded Limit of 150. Count = 47 - Total = 1027. Throwing PluggableAccessCheckException
Last I heard, the product group confirmed the behavior and said that since multiple pluggable trimmers can be registered, search cannot be stopped on throwing the exception. I assume that they are still trying to prevent continuously calling the security trimmer that threw the exception and hope that they will consider all results trimmed that are associated to a crawl rule in which a security trimmer threw the exception.
Bottom line: You may want to consider the risks of your custom security trimmer being called too many times and running too long. A workaround may be to simply return False for every URL that is provided to your security trimmer once you determine that you have been running too long. This could be done without doing any expensive processing.
Dirty Secret #2 – FAST Search for SharePoint 2010
Unfortunately, query time security trimmers are not supported with FAST for SharePoint (FS4SP). The FAST pipeline is different and although FAST is a flexible platform, I am told that FAST Search for SharePoint 2010 does not currently have a feasible way to make this happen.
I believe this is because FS4SP has deep refinements (among other features). Without FAST, SharePoint provides refiner information based on the first 50 results, but with FAST the exact count can be provided for each refiner. This exact count cannot be provided unless all of the results are processed, which just isn’t feasible with a query-time security trimmer (which can conceivably have tens of thousands of hits or more for a single query).
Last I heard this was not currently supported, but that R&D is actively working on a design change to allow for this functionality in a future release.
Security trimming is a great thing and a powerful feature used liberally throughout SharePoint. Security trimming with SharePoint search is crucial, especially when search results show hit highlighting of content. If possible, use security trimming at index time. If you must use query-time security trimming, go in with your eyes wide open.
I don't think you can crawl ACLs / Security Descriptors with the web site protocol handler. The BCS crawl through the search connector framework does allow for this. See the WindowsSecurityDescriptorField field on http://msdn.microsoft.com/en-us/library/ee556429.aspx for more details.
You state "Security trimming within the index is possible when the crawler can obtain Access Control Lists (ACLs) for each item and store them in the index." However, you don't expand on this method and I can't find any information how how the ACL's get into the index. For Web content-type crawls, how does one provide the ACL's for storage in the index? I can't find a meta tag to configure with this information.
Hi Kirk, I am using SharePoint 2010 enterprise search where I would like to show all the items to end user even if he does not have access to it. In short can I disable the built in security trimmers of SharePoint Search or Can I create a custom trimmer that could override the existing security trimmer so users would see all the items in search result ? Thanks in Advance Ajay Sawant
@rg, you'll want to make a batch call to your database and provide all Ids in the batch. For every call to the security trimmer it may contain up to 50 or even 75 URLs. You should call your database with the corresponding Id for each URL and have it return a result such that you can match up which ones are accessible and which are not. This still does not handle the case where there are somehow 100,000 hits from the query and the user only has access to a small number. In that case you need your security trimmer to keep track of how many times it has been called for the query (or maybe the amount of total time for the query) and have it stop making calls to database after some limit. The ideal approach is to have the database return a security descriptor as part of the crawl and avoid a query time security trimmer. That may or may not be feasible in your environment.
Hi Kirk, I have a scenario where a user has access for only particular record in a database table which has like 100,000 rows. Since OOB BCS "AccessChecker" checks one item at a time, there can performance issue. (I have to do a database call for each and every item when "AccessChecker" is called) lets say if the user searchs for brt* then it will return 100 rows and user has only access to first and last item. What is the best way to handle this scenario. Thank you.
I have profile pages set up for my BCS. When I use CST on the search results the CheckAccess method in the ISecurityTrimmer2 receives the urls starting with bdc3:// Is this also a limitation?
@Kirk, I "implemented the a) solution withOUT success" :) In fact, even if I make the trimmer answer "false" very quickly, I end up with a timeout because of the large number of result to trim and the delay betweens calls to the trimmer...
Thank you very much for your answer. I've already implemented the a) solution without success, I'll let you know if I got the KB.
@bcolin, you are welcome. I'm glad that this post was helpful. I had gone through a lot and wanted to share what I learned. It is good to see that it is helping others. Regarding "a way to tell the query engine to stop calling the security trimmer", I'm afraid that your only options are: a) If you reach a limit where you want to stop, save a flag in your session properties and whenever that is set return a BitArray of false values that you quickly construct without calling any external/expensive code. The BitArray constructor takes an Int32 to create a collection initially set to false (exactly what you want). This doesn't prevent your security trimmer from being called, but it does prevent it from running long. Do NOT throw the PluggableAccessCheckException as that will just ensure that you are called for the entire search result set. b) Put pressure on Microsoft. I tried to get a KB article out of my efforts and was not able to at the time. Maybe they have one by now.
Hi ! You said that "search cannot be stopped on throwing the PluggableAccessCheckException " I can confirm this behavior in my environnement : whether my trimmer send the exception or not, my security trimmer is still called for all search results, leading to a timeout on the query thread and an error to the UI. Is there a way to tell the query engine to stop calling the security trimmer ?? By the way, thank you very much for this article, it gave me very valuable informations
Don, keep in mind that this sort of security trimming only occurs in custom scenarios. For most content (including SharePoint content), the crawl process stores the ACLs (permission information) in the index with the content. However, this is not possible in all scenarios. If, for example, you crawl another web site that is not SharePoint, the crawler (aka Protocol Handler) has no way of asking the web site who has access to a particular page. It just knows that it has access. So, in that case it cannot store any ACLs in the index and search results will not be security trimmed unless a custom security trimmer is written and registered.
Thank you so much for this article. I knew that search results were security trimmed but I never realized that it wasn't done real-time. I have been supporting SharePoint for several years now and this article explains so many of the "issues" that my end users have had.
Hi Kirk, Thanks for your input... I did though of creating a custom security trimmer but the post below express that this is not possible solution because we cannot override default security trimmer. Because of this my custom security trimmer would only get the pre-trimmed results. http://msdn.microsoft.com/en-us/library/bb608305(v=office.12).aspx#1 In case of crawling the "Share Point Sites" as normal "Web Site" I am afraid that I might loose the prime functionality like "Search Refinement Panel" and displaying managed meta data column in search result. As a final thought I am thinking of creating a separate web application for search and disable the identity impersonation in web.config so end user could get all untrimmed results when searched because the app pool identity [instead of current user identity] would be passed down for all the operations. Do you think this would be a promising solution to the my requirement or would it create any issue for me in future. Thanks again for your help. Ajay Sawant
@Ajay, that is an interesting request. The short answer is that I do not know. By default SharePoint will use the protocol handler for crawling SharePoint content which will store the ACLs in the index. You can derive the protocol handler by looking at the "Local SharePoint sites" content source (that's the default name in SP2010) and see that it specifies that the Content Source Type is "SharePoint Sites". I suppose you could register a security trimmer for your SharePoint content source, but I suspect that it would only provide URLs to your security trimmer for content that the current user can see (I have never tried registering a security trimmer on content that has ACLs in the index). It might be worth a shot, but it would be a long shot. Another approach might be to disable the "Local SharePoint sites" content source or remove some of the start addresses from it, then add a new content source using a "Web Sites" content Source Type. This should use a protocol handler that does not store the ACLs in the index. This is a less efficient way to crawl SharePoint content, though, and you would likely see different results for search requests (the relevancy will likely be different and some content may not get crawled under certain circumstances).
To make a batch call to the DB the OOB BCS “AccessChecker” wouldn't work because it can only process one request at a time. I should go for a custom security trimmer as you mentioned in the example above. Thank you.
Thanks Kirk. Now I'm using the BCS CheckAccess method for the security trim. Everything seems to work ok.
@rg, it may not be a limitation, but it is a pain. If you want to dig into the URL to learn more about it, see my Stack Overflow question and answer here: http://stackoverflow.com/questions/3817142/how-do-i-load-bdc-data-from-a-bdc-url
@bcolin, sorry to hear this. Do you think you are you testing with realistic conditions? It shouldn't be very expensive to call the security trimmer if it does a no-op but it sounds like the delay between calls is too much (not sure what causes this delay). Do you know how many times your security trimmer gets called before the thread timeout is hit? Based on my data above it looks like it took (in my environment) about 1 second to handle 1000 URLs. That indicates that 90k URLs could be a problem or roughly 1200 calls to the security trimmer at 75 URLs a pop. I know that throwing the PluggableAccessCheckException causes all URLs to be processed, but I wonder if that affects the delay time between calling the security trimmer. I don't think I ever measured that. Maybe it significantly decreases the time so throwing the exception might be a better option in some cases. I also wonder if any SharePoint 2010 updates could have fixed this issue or affected the behavior in any way. I did all of this work a year ago so maybe that is possible (not sure how up-to-date your environment is).
@bcolon, you say that you've "already implemented the a) solution without success". Was that a typo and you really were successful instead of unsuccessful? Otherwise I'm curious what your problems were.