#93: Optimize Plone for speed
- Proposed by
- Alexander Limi
- Proposal type
- Architecture
- Repository branch
- plip93-optimize-templates
- State
- completed
Motivation
It's important that Plone becomes faster - although Plone is made to be fronted with Apache/Squid caching, it makes sense to optimize page load speeds for big deployments - and it also makes a difference in the day-to-day development usage.
Assumptions
This proposal mostly concerns itself with optimization of page views for anonymous users. While editing speed matters too, this is a more complex part to optimize, and is outside the scope of this PLIP, although should obviously be a focus area in the future.
Proposal
After doing some initial, simple profiling, I identified some attack vectors that would give a lot of speed-up with little effort.
The areas where I think we should focus our attentions are:
- The breadcrumbs code
- The 'listMetaTags' method
- The calendar portlet
Here is some output from 10 page loads as Anonymous User of the front page of a newly created Plone site on my 1.5GHz PowerBook G4 running in debug mode - Plone 2.1 branch (Revision: 6338). I use PTProfiler as my main analysis tool for this.
The table has been edited to show only the main offenders speed-wise. I have highlighted cases of very expensive single calls or with an excessive number of calls to methods.
| Expression (partial listing) | Total time | Number of calls | Time per call |
|---|---|---|---|
| Total rendering time | 8.9213 | 10 | 0.89213 |
| python: portal.portal_actions.listFilteredActionsFor(here) | 0.7863 | 10 | 0.07863 |
| path: here/listMetaTags|nothing | 0.4551 | 10 | 0.04551 |
| python: portal.breadcrumbs(here) | 0.2081 | 10 | 0.02081 |
| python: here.getBeginAndEndTimes(day=daynumber, month=month, year=year) | 0.1694 | 60 | 0.00282 |
| path: day/event | 0.0672 | 700 | 0.0001 |
| python: current.year()==year and current.month()==month and current.day()==int(daynumber) | 0.0558 | 580 | 0.0001 |
| python: here.portal_url() + '/search?review_state=published &start.query:record:date=%s&start.range:record=max &end.query:record:date=%s &end.range:record=min'%(pss.url_quote(begin), pss.url_quote(end)) | 0.0455 | 60 | 0.00076 |
| path: day/day | 0.0384 | 350 | 0.00011 |
| python: '%d%0.2d%0.2d' % (year, month, daynumber) | 0.0258 | 350 | 7e-05 |
| python: test(current.year()==year and current.month()==month and current.day()==int(daynumber), 'todayevent', 'event') | 0.0186 | 60 | 0.00031 |
| python: DateTime(begEndTimes[0].timeTime()+86400).ISO() | 0.0184 | 60 | 0.00031 |
As you can see, the main offender speed-wise is the listFilteredActionsFor method - although there's not much we can do about this, since it's a CMF construct. Initial testing with CMF 1.5 (which has lazy action evaluation) didn't show any improvement here - so we'll skip this as a target for our optimizations.
I have grouped some of the methods from the calendar portlet at the bottom of the table, and given them an alternate background color. As you see, there are a number of calls here that are done an excessive amount of times.
Our main targets and some comments about each one:
- Breadcrumbs
-
This code is essentially trying to do the same as the nav tree - just in a flat, depth-only way. It has lots of exception handling, "clever" and unnecessary code, and things that are total overkill for a breadcrumb implementation. If people want to support all the special cases, that's fine - but the default implementation should not. The fact that
breadcrumb.pyis 166 (!) lines long with permission checks and multiple conditional branches should be a good indicator of this.
My suggestion: See if we can re-use code from the nav tree implementation and make it more efficient, and make it a bit stupider if necessary. - listMetaTags method
- I have no idea why this thing is so expensive, but it is. Tiran recently moved it to a tool to see if running it in unrestricted code would make it faster, but it only made a marginal difference, easily attributable to testing variations.
- Calendar portlet
- This thing is a chapter in itself. It has an incredible amount of calls (I have only included the most exceptional ones, there are lots of others), and does multiple
tal:definesinsidetal:repeats, among other things.
My suggestion: This code should be rewritten. It's currently building the table for the calendar in a very inefficient way, and we should also remove the pop-up divs and let it use the HTMLtitleattribute instead, like the rest of Plone. This is also better for accessibility, and will remove half (well, almost ;) of the excessive white space in the Plone HTML output.
Progress log
April 3rd, 2005, limi:Some interesting numbers from Plone 2.0.5, mainly showing that listMetaTags is less expensive here, and also that listFilteredActions takes less time here (what is adding a lot of actions in 2.1, and how can we minimize the impact?):
| Expression | Total time | Number of calls | Time per call |
|---|---|---|---|
| Total rendering time | 8.3012 | 10 | 0.83012 |
| python: portal.portal_actions.listFilteredActionsFor(here) | 0.5233 | 10 | 0.05233 |
| python: here.plone_utils.createNavigationTreeBuilder(portalObject,navBatchStart) | 0.4885 | 10 | 0.04885 |
| path: here/getAllowedTypes | 0.4598 | 10 | 0.04598 |
| python: here.CookedBody(stx_level=2) | 0.3094 | 10 | 0.03094 |
| path: here/listMetaTags|nothing | 0.1162 | 10 | 0.01162 |
| path: day/event | 0.1153 | 1050 | 0.00011 |
On a positive note, we see that getAllowedTypes takes up a lot of time in 2.0.5, and that it has been totally eliminated from the 2.1 anonymous view. Also eliminated is the nav tree cost, which is negligible in 2.1. All in all, we've eliminated about 1 second on the 10 page loads with the new stuff in 2.1, but something else is bogging us down.
Unfortunately, something is sucking up the CPU time we won with the optimizations. The AT-based types?
April 3rd, 2005, limi:
Investigated the listMetaTags part after Tiran moved it to unrestricted code - it is a bit faster, but not a lot. There is a lot of crazy checks and conditionals going on to produce DC.* tags that none of the web crawlers use, and most interpret as line noise.
My proposal is to introduce a switch in site_properties called exposeDCMetaTags that is off by default, since no search engines or crawlers use it, and let the 3 people in the world (of which 2 are librarians and the last one flunked librarian school ;) turn it on with a performance penalty if they need it.
We only want meta name="description" in Plone by default, as this is the only one used by search engines - even keywords are of questionable usefulness.
listMetaTags - before: 0.45s - after: 0.09s
Yay for Alec! His implementation of my suggestion slices the time to 1/5 of the previous usage and introduces the DC metadata switch. Next target is breadcrumbs. Tesdal has added breadcrumb support to ExtendedPathIndex, so it should be possible to make it significantly cheaper than it is now.
April 5th, 2005, limi:
breadcrumbs - before: 0.20s - after: 0.06s
Nice. Using Helge Tesdal's new ExtendedPathIndex (that also powers the new nav tree and the site map), Alec Mitchell implemented a version of the breadcrumbs that cuts rendering time to a third of the original.
April 6th, 2005, limi:
listFilteredActionsFor - before: 0.78s - after: 0.56s
A simple, but effective speedup was to remove an unnecessary loop in listFilteredActionsFor, and it's by almost a third, and is more effective the more actions you have, so it should really make a difference if you are logged in too. This is the final change we're doing on the branch, merge time! May 25th, 2005, limi:
We had an interesting use case at a client site where they had 113(!) content types. This lead us to a quite interesting discovery that listTypeInfo in CMF is extremely expensive when you get a lot of types.
Alec stepped up (as usual ;) and helped out. There is now a new method override in Plone's type tool that gets rid of the madness and uses a much more light-weight method to do the exact same job.
The result? listTypeInfo - before: 1.20s - after: 0.40s
Participants
Alexander Limi
Alec Mitchell
We tried this, but didn't affect the speed
It didn't make a big difference - small enough to be just an artifact of slightly different conditions.
a few quick fixes
These two snippets probably appear in the calendar:
python: current.year()==year and current.month()==month and current.day()==int(daynumber)
python: test(current.year()==year and current.month()==month and current.day()==int(daynumber), todayevent, event)
The calendar is iterating over all days in the current month. The condition will fail about 97% of the time, but the way it is written, it won't fail until the third test. If you reverse the order of the tests, i.e. check current.day() == int(daynumber), you will reduce the computation required by about 2/3. It would be even smarter to define variables current_year = current.year(), etc, int_day_number = int(daynumber) and pull them out of whatever loop is being iterated over.
listFilteredActionsFor and worklist action
I believe the workflow tool does a catalog search for each worklist when retrieving the global workflow actions. Maybe there should be a setting somewhere indicating if the global workflow actions should be included.
Shouldn't matter here
This benchmark is for anonymous users, and indeed it made no perceptible difference when the code was removed.
That being said, if the worklists have no purpose in the actions, they should go away.
We should do some profiling for logged-in users with review queue, editing a document etc - but I was trying to make these targets small and easy to knock down in time for 2.1.
pts
is it possible to assess how much the PTS is responsible for any slowness?
Yes, but not from inside PTProfiler
You'll have to use the Python profiler or similar for that. I'm not worthy. ;)
I guess a simple benchmark would be to remove PTS from a Plone install and benchmark it before/after.
Schwartzian Transform on sorts
This might only be noticeable in bigger sites, but still.
We can use Schwartzian Transform when sorting. That is to precompute the keys, and put it in a list of tuples, instead of having the sort look up the attributes of objects for every comparision.
unsorted = [object1, object2] sortlist = [(o.sortkey, o) for o in unsorted] sortlist.sort() sorted = [x[-1] for x in sortlist]
sortlist becomes a list of tuples, like [(sortkey1, object1), (sortkey2, object2)]
This was mentioned by the Reflab guys some time ago, and they did some profiling indicating noticeable difference in bigger lists.
Schwartzian tranforms
I love Schwartzian transforms and try use them everywhere they make sense. (Un)fortunately, I don't see very many unoptimized sorts in CMFPlone. The few that are around are in python scripts and tend to operate on small lists (allowedTypes, availableLanguages, RoleMap, worklists, configlets). Not much to be gained here I fear.
Faster calendar
Hi,
I spent some couple of hours splitting the calendar portlet in 2 parts:
- A python scripts that builds the data
- A new calendar portlet with only simple path expressions to render those data (no complex nested expressions)
On my development box :
- standard calendar portlet with 4 events in current month is built in 71 ms
- the faster calendar portlet with same events is built in 49 ms (saving about 30%)
This is only a direct translation to Python of the logic found in the standard portlet. Things could certainly be faster with a better optimized code.
This is poorly tested but behaves exactly as the standard portlet.
Want the code ? Where should I post it (not a Plone commiter :o) ?
Cheers
Metadata
Quoting from above:
My proposal is to introduce a switch in site_properties called exposeDCMetaTags that is off by default, since no search engines or crawlers use it, and let the 3 people in the world (of which 2 are librarians and the last one flunked librarian school ;) turn it on with a performance penalty if they need it.
We only want meta name="description" in Plone by default, as this is the only one used by search engines - even keywords are of questionable usefulness.
There's actually no reason to dis the librarians in the name of performance since one can have the best of both worlds. There is a relatively standard way of handling Dublin Core metadata that makes it available to tools that want it without making the main pages load significantly more slowly. It's done by making an RDF of metadata available, but only linking to it from the source XHTML. Thus humans browsing don't spend any time waiting for the invisible metadata to be generated / downloaded, but savvy search engines (and yes, some are already deliberately scooping up external metadata -- you can check your own logs once you've implemented it) and Semantic Web tools (as well as presumably librarians) can specifically request it. Plus, this technique is open to easy future expansion as it's not at all restricted to Dublin Core metadata. PRISM and FOAF metadata are currently found in such RDF stores in the wild now, too.
The Dublin Core site itself makes use of this technique (although they still embed the data in the XHTML, too, even though that's overkill -- of course, being the Dublin Core site it makes sense for them to really show off Dublin Core metadata).
We've had a working dynamic metadata generator in place at Saugus.net too for quite a long time built using only raw Zope CMF and not requiring anything from Plone. I'm sure it could very easily be ported to Plone, though. The general idea is to make a page template called metadata.rdf that can be accessed as a method of any object in the tree, dynamically load it up with Dublin Core goodness, and a single line like:
to the standard head section macro. It's fast and easy and doesn't sacrifice any functionality for tools that do make use of metadata. We're still using CMF 1.4, but it should be a trivial port to 1.5. I should check on getting it added into the base CMF.
maybe a small solution to improve speed...
I've made some benchmarks on listFilterActionsFor().
60-70% of the time is spend here:
# Include actions from specific tools. for provider_name in self.listActionProviders(): provider = getattr(self, provider_name) self._listActions(append,provider,info,ec)
This code asks every single action provider for a list of matching actions. As expected most of the time is used to evaluate the conditions for every single action, security checks etc.
We can't optimize the checks for the action conditions. I might be possible to speed up the check for the action permissions. I assume that most actions share a small number of permissions e.g. View, Manage Portal or so. Instead of checking the same permission over and over again against the context object one could build up a cache that caches the permission check for the context object and a permission within one request. This could speed up things a bit...