So, we are replacing Tidy with our own tool based on the HTML5 specification, but which also adds a few Tidy-compatibility workarounds to minimize the impact of replacing Tidy. Doing an immediate and straight-forward replacement of Tidy with a third-party HTML5 based tool is not feasible, since an HTML5-based tool would repair some markup differently and this can break how pages look. However, Wikimedia wikis have a huge corpus of pages whose markup relies on Tidy's fixups. In this new technological landscape, Tidy should really be replaced with an HTML5 parser that fixes up the broken markup and generates valid, well-formed HTML markup in the standard way. This algorithm also clearly specifies how broken markup should be fixed. HTML5 is the standard today, and the parsing algorithm for HTML5 is clearly specified, which has led to compatible implementations across browsers and other libraries. Together, all these issues have led to lots of bugs filed against it on Phabricator, and a replacement has been asked for since at least 2013. ![]() As noted earlier, Tidy does HTML cleanup unrelated to fixing errors. The older Tidy is no longer being packaged. After spending years without active maintenance, Tidy has now been revived as "tidy-html5" with very different behavior. Tidy's behavior is loosely based on HTML4 semantics but matches no modern browser. ![]() ![]() Tidy's technology is from the 1990s, when browsers weren't standardized.
0 Comments
Leave a Reply. |