#125: Ensuring link/reference integrity (removing 404 links)
- Proposed by
- Alexander Limi
- Seconded by
- Andi Zeidler
- Proposal type
- Architecture
- Assigned to release
- Repository branch
- plip125-link-integrity-bundle
- State
- completed
Motivation
One of the things that need to be solved in Plone is the ability to automatically associate objects that "touch" each other, so that you know:
- Which items will be affected if you delete the current item
- Which items will need updating if you move the current item.
Also, when you move an object, and an old bookmark is pointing to it, that page should automatically redirect the user to the new location (possibly with a message saying that they were redirected).
Assumptions
- This is still very much a work-in-progress — comments appreciated.
- We will not consider in-process / long-running approaches like CMFLinkChecker because it's not a good approach for an already busy Plone site to be doing this.
- This proposal does intentionally not deal with outside links, you should use a normal link-checker for that.
- The traditional way of dealing with this sort of problem has been to use hashes for object/item names, but we're not willing to sacrifice logical item naming and nice URLs for this.
- For the move/rename case, we would like to extend the RedirectionTool, a proven, existing tool to handle these kinds of references and redirections. This is a suggestion, though - and if someone can come up with a compelling reason to not use it and rewrite from scratch, I won't let that block this PLIP. :)
Proposal
Here are some use cases to show how I envision this being handled:
Use case: Deleting an item
- User adds a normal page
- In that page, he references two images
- When Page is saved, it looks for local references, and creates a
isReferencedByreference on both the page and the images - Time passes
- A different user comes along, tries to delete one of the images referenced above
- He then gets a warning saying: "The image XYZ is used in the page ABC, are you sure you want to delete it?"
- (If we want to be a bit smart about this, the RedirectionTool — see below — could register that a page was explicitly deleted and say something like "the page you were looking for was deleted, maybe some of the following pages contain what you were looking for?" followed by a search. This is the current behaviour of RedirectionTool, minus the explicit knowledge that something was deleted.)
Use case: Renaming/moving an item
- User creates a page
- Some time later, the user decides to reorganize his web site, and moves the pages around, including the page created in the first step
- Upon being moved or renamed, the item registers its old location and its new location in a list that maps old location → new location (RedirectionTool is a working implementation of this)
- Another use that bookmarked the old page visits the old location
- RedirectionTool sees that this is a 404, and looks up the old location in its list - finds the new location
- User is redirected to new location, potentially with a message saying "You have been redirected, please update your bookmarks"
Implementation
I found the following note in a mail from Ben Saller (IIRC) — I'm including it here since it might be helpful in parts of the implementation:
Actually this is possible with core archetypes by doing:
from Products.Archetypes.references import HoldingReference
...and in your reference field schema definition do):
referenceClass=HoldingReference
This will raise a BeforeDelete Exception whenever someone tries to delete an object which is the target of an existing reference. There is also a CascadeReference which deletes all references when deleting the main object.
Progress log
Andi has implemented the integrity part of this in SVN. The redirection/move part was done by optilude. Both are pretty much finished and already merged into 3.0alpha (as the status already indicates).
Yup, might be interesting for the redirection part
For the link integrity, I believe Andi has used some Z3 stuff, but I'll let him explain that part. :)
definitely interesting...
first of all, thanks for the update here, limi. i was offline last week, so i only saw it late last night after digging through all that spam... :)
anyway, you're right, the redirecting part isn't done yet, or rather, it's not integrated yet. i've looked at topp.rose back at the island (and imho i understood it, too :)) and i think it really covers like 90% of that use case. the only thing missing is to properly hook it up with plone (or maybe rather the link integrity stuff). my plan is to try to do just that right after i've found a way to test the delete use case, which turned out to be a bit tricky...
and, as for the explanation part, that's another thing [http://dev.plone.org/collective/browser/LinkIntegrity/trunk/TODO.txt to do] before integrating topp.rose. the code does use z3 stuff indeed, and i guess on first glance it even does use some "funny" ways of getting what we want, so i'll better write a document explaining how and more importantly also why it does it like that. i think it'll look less scary then... :)
prototype for using z3 events
The basic architecure uses IObjectMoved and IObjectDeleted events to maintain a simple Btree storage of the paths an object once occupied (keyed by path, and oid or uid).
By means of a traversal adapter, empty paths that match old object location in the storage are redirected to new URLs.
This is much lower overhead and potential scales better than using references. The traversal adapter, storage utility, redirection view, and the event subscribers may also be overridden to easily extend behavior, if needed.