Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Niidae Wiki
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Encyclopedia:PHP script tech talk
Project page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Archive}} Simply put, isn't this page dead? I can't see what is still in pending and what is fixed. Also, the intro says "do not add bugs and feature requests here", most of stuff are about features and bugs. ''Yup, this page is dead. Those curious should direct themselves to the Wikitech-l [[Wikipedia:Mailing lists|mailing list]] or set something up on meta under [[m:How to become a Wikipedia hacker]].'' -------- '''This is the place to discuss bug fixes and planned feature on a more "technical" level.''' (See also the new [http://nupedia.com/mailman/listinfo/wikitech-l wikitech-l] mailing list.) Please, do not add bugs and feature requests here: instead, see [[Wikipedia:PHP script]] for more details of how to report bugs. == Serious bugs == Things that should be repaired ASAP. === diff won't work === I can't get diffs. It could be the cache, if it is running here already. Somebody please fix this! --[[user:Magnus Manske|Magnus Manske]] :Yes, it's a cache bug. I checked a fix into CVS last night, but forgot to mention it here. (2002/2/8) --[[user:Brion VIBBER|Brion VIBBER]] === #REDIRECTs that end in an eternal edit conflict === *Suggestions? --[[user:Magnus Manske|Magnus Manske]] ::I'm probably stating the obvious here, but this appears to be caused by the page trying to redirect to a different page than has been specified. If this page doesn't exist (as is usually the case) then the page goes into edit mode when you click Save, which gives an edit conflict. If the page ''does'' exist, then no edit conflict occurs, but the redirect does not go to the expected place (as for [[Zundark/Old_Talk]], which is redirected to [[user:Zundark]] but actually ends up at [[user:Zundark/Old_Talk]]). --[[user:Zundark|Zundark]], 2002 Feb 3 :::(2002/02/03 20:43 PST) Fix is in CVS for the problem when the redirected page does exist. (s/$this->'''$'''subPageTitle/$this->subPageTitle/) Doesn't seem to have fixed the doesn't-exist problem, I'll look at it some more. --[[user:Brion VIBBER|Brion VIBBER]] ---- == Volunteers wanted == These tasks need volunteers to hack'em! === Mask minor edits on Recent Changes === Might be fixed by a patch from [[user:Brion VIBBER|Brion VIBBER]] and myself --[[user:Magnus Manske|Magnus Manske]] === Fix the Recent Changes "(# changes)" counter === Might be fixed by a patch from [[user:Brion VIBBER|Brion VIBBER]] and myself --[[user:Magnus Manske|Magnus Manske]] * I just looked in cvs (thats 2002/2/4 00:02 Amsterdam time) and it seems you still add a variable $addoriginal to the count. But I think that is silly because you should never count the current page if you are counting the changes. So just remove $addoriginal and the problem is solved. -- [[user:Jan Hidders|Jan Hidders]] (PS. Wouldn't it be nice if the sign-shortcut ~ ~ ~ would always be replaced with name and time? :-)) ::Hey, that's a good idea! Especially for the bug-report pages... [[user:Brion VIBBER|Brion VIBBER]] (2002/2/4 15:18 PST) ::: Yes, unfortunately it's the only thing that made sense in my remark. :-/ What I should have said was the following. The variable $addoriginal should be 0 if the page did not already exist the previous day and current page is a minor edit and the user does not want to see minor edits. -- [[user:Jan Hidders|Jan Hidders]] (2001/2/5 8:45 GMT+1) === Fixing some parser bugs === :Especially the <pre> tags. ::I've replaced removeHTMLtags() with behavior more like the old usemod version; ie instead of forbidding a few tags, it allows only a small number. Thus, no <<b></b>span>, <<b></b>object> etc. However it still needs to be able to strip out unknown elements/parameters; I can still write <b style="color: blue; font-weight: normal; text-decoration: underline; cursor: pointer" onClick="alert('Not a link! Evil Javascript!')">naughty things like this</b>. {this is a 'link' that isn't a link, and runs some JavaScript code} (2002/02/04 20:48 PST) --[[user:Brion VIBBER|Brion VIBBER]] ::Also, I commented out the line in subParseContents that makes &<b></b>amp; followed by text that could be an entity into the entity. I suspect it was put in to fix pages that were getting over-escaped during editing, but that bug seems to be gone now and it just makes it hard to write the name of an entity. Ie, "&amp;" should *not* appear as just an ampersand, but an ampersand followed by "amp;". --BV ---- == Brainstorming == Ideas for solutions needed here. === Speeding up the PHP script === * Taking apart "specialPages.php" This file is getting quite large, resulting in high compilation times. I suggest two steps : # make pages like "special_userlogout.php" for each function ("userlogout", in this case) # after that, change the include statement so it only includes the needed function. This, in turn, can include other shared functions I started doing this now. --[[user:Magnus Manske|Magnus Manske]] * Caching of pages for reading only (Jimbo's idea). ** Could be tricky. Would have to adapt to viewing preferences and newly created pages (red/blue links). :It may be possible to cache a 'common' almost-final version, which can then have a regexp run over it to set the link color and paragraph justification, and then inserted into the header/footer; this would at least save parsing the wiki page every time. Still need to deal with new pages though... Simplest way might be to run the "which pages link to this" check on a newly created page and expire the cached versions of anything that does. --[[user:Brion VIBBER|Brion VIBBER]] :: A lot of the regexp work could be skipped by using a ''quick'' preprocessor (ideally one that slaps in the header/footer without even looking at the text) and/or [[Cascading Style Sheets|CSS]]. --[[user:Uriyan|Uriyan]] :::Yup. CSS won't handle the difference between <span style="color: red; text-decoration: underline; cursor:pointer;" onClick="alert('This kind of HTML should be filtered out. No, really. I guess that means I volunteer to write the filter code. Sigh.')">red links</span> and [classic links][[How does one edit a page|?]] for new pages, though. I recommend we change or eliminate one or the other. --[[user:Brion VIBBER|Brion VIBBER]] I take that back, CSS should do fine there. How does this sound:<br> This is a <span class="newlinkedge">[</span><a href="foo" class="newlink">new link</a><span class="newlinkedge">]<a href="foo">?</a></span>. where we define either: a.newlink { color: red; } .newlinkedge { display: none; } or a.newlink { color: black; text-decoration: none; } .newlinkedge { } in the style sheet? The text portion will still be clickable in the old-style case, though that could probably be "fixed" if desired. --[[user:Brion VIBBER|Brion VIBBER]] ::(2002/02/03 15:05 PST) I've changed the CVS version to use style sheets for the link colors, paragraph justification, and text/background color. (Try it at my [http://leukas.dyndns.org/wiki/ test server], if you can find your way around the partially Esperanto-localized interface.) Keeps down the number of things that need to be changed if somebody wants to change the styles further, and should make the HTML-ized page guts cacheable. '''Caching proposal''' # Create a new field in the ''cur'' table named ''cur_cache'' (MEDIUMTEXT), empty by default # When a pages is saved after edit, the cache is cleared # When a page is viewed, ## and the cache has been used X times, it is cleared (enforced up-to-date) ## and the cache entry contains text, the cache is adapted to current user settings and displayed ## and the cache is empty, the text is rendered, displayed and stored in the cache field # When a new page is created, the cache of all pages that link to the new page is cleared # Pages with {{variables}} are not cached --[[user:Magnus Manske|Magnus Manske]] :: Why not simply update the cache field every time the page is edited? You have to parse the page then anyway because it is presented after the edit. -- [[user:Jan Hidders|Jan Hidders]] PS. I know I'm getting annoying but can I say again that we first should measure which pages are eating up the server CPU time? Otherwise the implementation of caching might be a waste of time and effort that unnecessarily complicates the code. :::On viewing an uncached page, the contents is rendered anyway. The result can be slightly altered and stored as the cache. Generating the cache upon ''saving'' means it will have to be rendered especially for that purpose, thus wasting resources. Also, when the cache is flushed, the page won't be cached again until after the next edit. :::That said, we should of course check the special pages and improve their speed. I already (kinda) cached the Most Wanted. The other candidate is Orphans, but I don't know how to cache that; anyway, with de-orphanising progressing, the orphans list will get shorter, and the popularity of that page might drop. :::Also, the Main Page has to run a database request each time it is viewed, to keep the an up-to-date article count. :::Editing this page, connected with 10MBit to the Internet, wikipedia performs quite well; I know this will change once the US goes online ;) --[[user:Magnus Manske|Magnus Manske]] :::: I must be misunderstanding someting. After submitting I am always redirected to the page I just edited, right? So are you now suggesting that you would allow that I would then see an older cached version without the changes I just made? Isn't that a bit confusing for the writer? -- [[user:Jan Hidders|Jan Hidders]] :::::I suspect the sane behavior would be to clear the cache field when a page is saved (and the cache fields of any pages that link to it, if it's a new page). Then, when the page is loaded up again (for the edited page, that would be immediately), the empty cache is noticed and the page is rerendered and stored. ::::::Yes, that's what I meant. Sorry for being unclear. I added a line to the proposal. --[[user:Magnus Manske|Magnus Manske]] :::::::But why then do you need the rule for "enforce up-to-date"? -- [[user:Jan Hidders|Jan Hidders]] ::::::::Just a safety mechanism to ensure every page is updated once in a while. Might not be necessary, but it won't do much harm if set to a high value. --[[user:Magnus Manske|Magnus Manske]] :::::As far as the mainpage... how many pages actually use those <nowiki>{{blahblah}}</nowiki> things? Should we not cache pages that contain them? -- [[user:Brion VIBBER|Brion VIBBER]] :::::No other pages have these, AFAIK. We could count the number of "{{" occurrences before and after variable replacement, and if it is unchanged, the page can be cached, otherwise not. --[[user:Magnus Manske|Magnus Manske]] ---- '''What eats server time''' I suspect the RecentChanges viewings eat tremendous resources and are the bottleneck. The web server and the PHP script are usually snappy, because previewing, which doesn't need db access, is fast. Everybody working on the site calls RecentChanges every couple of minutes. The server sped up noticeably when Jimbo changed the default from 250 down to 50 pages. It seems that to create RecentChanges, we search through the whole cur table and then sort and present the latest ''n'' changes, correct? This means as the database grows, it will only get slower. How about this suggestion: add a new table recent_edits, which only stores information about the recent edits (page title, user name, comment, timestamp), so that we only need to dump out the first ''n'' entries from that table for every RecentChanges view (hopefully without need to sort them)? Maybe recent_edits doesn't even have to be a mysql table, just a list somehow that we prune down every once in a while. Or maybe even keep a ready made RecentChanges HTML page always up to date and serve it statically. 2/7/2002 [[user:AxelBoldt|AxelBoldt]] :I think this is a good idea; after all, the list of recent changes doesn't exactly change completely every time it's loaded; new things pop up on the top, and old things drop out of the range of interest on the bottom or, occasionally, in the middle. I'll try implementing this tonight... [[user:Brion VIBBER|Brion VIBBER]] :On second thought, is this really necessary? Can't we index the table by cur_timestamp as well as by title/id? The database could then easily ignore all entries that had not been modified up to a certain point. I'm not really a MySQL guru though, I don't know how to implement this (or if there's some good reason why it can't be done). Cacheing the default display of RecentChanges is easy enough though, and ought to save a few cycles -- there are probably more views than there are changes. [[user:Brion VIBBER|Brion VIBBER]] ::Implemented caching for RecentChanges default settings. On my small test database (457 pages) I see a roughly 100% speedup on loading [[special:Recentchanges]]. (2002-02-08 00:33) [[user:Brion VIBBER|Brion VIBBER]] (2002/02/07 22:29 PST) I've added a 5-minute minimum wait between refreshes of WantedPages (see [http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/wikipedia/phpwiki/fpw/special_wantedpages.php.diff?r1=1.1&r2=1.2 revision 1.2 of special_wantedpages.php] and [http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/wikipedia/phpwiki/fpw/wikiTextEn.php.diff?r1=1.13&r2=1.14 1.14 of wikiTextEn.php]). Good idea? Bad idea? Shouldn't affect legitimate users, but makes it slightly more difficult for malicious or accidental otver-refreshing to overwhelm the server. --[[user:Brion VIBBER|Brion VIBBER]] How about decentralizing the Wiki? If it was relatively easy to get different Wiki servers to talk to each other and do some automatic linking then it should speed things up significantly. I find the Wikipedia a great idea and I am prepared to provide some serverspace (because of my interests preferably the medical bit). --[[user:Mathis|Mathis]] ---- * '''Database operations & efficiency''' ** I notice there are a lot of mysql_connect()/mysql_close() pairs in the code; depending on the page loaded, the database can be opened and closed from 5 to 12 times. This seems excessive to me... mysql_connect() doesn't open a new connection if one is left open, and the connection is automatically closed when the script finishes. Surely the overhead of opening and closing multiple connection is worse than the overhead of having one connection open for the whole fraction of a second that it takes to run the several database operations needed by each page? Taking them out doesn't seem to affect performance significantly on my test machine, but I have a small database and I'm the only one using it. (Indeed, it may even make sense to use [http://www.php.net/manual/en/features.persistent-connections.php persistent connections].) --[[user:Brion VIBBER|Brion VIBBER]] (2002/02/06 02:37 PST) ***I can't access the CVS from here. Why don't you outcomment the mysql_close() lines, and see what happens when Jimbo actually updates the running version (in a month or so;). If that works out, we could try the persistent connections, if not, we just remove the # again. --[[user:Magnus Manske|Magnus Manske]] ****Okay, done. Guess we'll see what happens... mwoooh haa haa haa... Also, turns out I missed a lot of them in my initial count; revise that to "the database can be opened and closed from 10 to 21 times" per page. [[user:Brion VIBBER|Brion VIBBER]] We definitely should use persistent database connections (mysql_pconnect instead of mysql_connect). There's no point in constantly transmitting the same passwords and usernames. mysql_pconnect is a faster drop-in replacement for mysql_connect. It only speeds things up if php is running as an apache module, and I assume that's the case. 8/2/2002 [[user:AxelBoldt|AxelBoldt]] ---- * Browser-specific page layout ::I notice though, that the tables in the page layout are setting their border properties based on whether the user agent is Internet Explorer. This explains why the tables have thin black borders in Internet Explorer and no borders in Mozilla... Magnus, is there any reason for this? I'd prefer to replace the table with some CSS markup in any case. --[[user:Brion VIBBER|Brion VIBBER]] :::The reason is that I like the thin black lines in IE, but other browsers don't support that, they draw ''all'' lines black, which looks ugly (try it!). If you know how to change it, go right ahead :) --[[user:Magnus Manske|Magnus Manske]] ::::Done, checked in. Looks ever so slightly different in IE and Mozilla, but approximately the same as the previous behavior in IE. Also looks okay in Konqueror 2.2.1, ugly but visible in Opera 6 (some beta version I have), but still doesn't show in Netscape 4.x. [[user:Brion VIBBER|Brion VIBBER]] 2002/02/04 15:21 PST * Pre-compiling of the PHP code. ** Perhaps the [http://apc.communityconnect.com/ Alternative PHP Cache] could help? (I haven't tried it.) --[[user:Clifford Adams|Clifford Adams]] * Optimize slow code parts (where? why?) ** Do we actually know what the slow parts are? My gut feeling is that the Recent Changes page is the slowest, but it would be nice if we could do some actual measurements on a server that is serving only one client but has a large database. (Does somebody have a big SQL dump?) Anyway, presuming that it is, I looked at the code and I think it can be made much much more optimal by combining the two SQL queries into one. Right now it computes a JOIN and a GROUP BY in PHP which can be done far more efficiently by the database. However, it should then be possible to do a GROUP BY on the day, which is now hidden in the time stamp. So we would split this column into a day column and a time column. Do I have your permission to attempt this? (But first I would like to know if it is worth it, i.e., if the Recent Changes page plays a major part in the slowing down. I thought Magnus said something about a memory leak in Apache, so perhaps we should try to find that first.) -- [[user:Jan Hidders|Jan Hidders]] *** My (old) suggestion for Recent Changes optimization was to make it a separate table containing the last 5000+ changes. (The table would store only the RC-related data, not the actual page contents. It could be simply added to by the edit function, and trimmed in daily/weekly maintenance.) This table would eliminate the need for each RC page to search the *entire* DB looking for the most-recently-updated pages. --[[user:Clifford Adams|Clifford Adams]] **** It doesn't, it uses the indexes to do that. That's the whole point of using a database; they are usually very clever at these things. :-) I would like to add the remark that letting the database do the joins for you does often lead to a performance improvent of orders of magnitude. If I'm right this could be a major boost. -- [[user:Jan Hidders|Jan Hidders]] (2001/2/5 8:48 GMT+1) ***** But right now, as far as I can see, we don't index on cur_timestamp and we actually do search through the whole database for every RecentChanges request. That needs to be fixed asap. I agree that the RecentChanges code could be written a lot more elegantly using SQL joins. 2/8/2002 [[user:AxelBoldt|AxelBoldt]] * Eliminate the access count function ("This page has been accessed 6 times"). I have a feeling that the huge number of writes to the database may be killing performance. A much less way to do this would be to process the Apache log files daily or hourly with a separate script. --[[user:Clifford Adams|Clifford Adams]] ** i think it'd be cool to make a new statistics module for logging all accesses in the db, this can be done as a mod to apache i belive, instead of sending hits to the log, you send em to a db, and using that for the hit counter, to keep db access sane using INSERT DELAYED would be a good compromise between realtime stats and efficiency, if you added database replication itd be one cool, scalable beast * The search could be sped up if we used a fulltext index and the match operation as described in http://www.mysql.com/doc/F/u/Fulltext_Search.html [[user:AxelBoldt|AxelBoldt]] ---- === Resolved issues === These could probably be deleted or moved to a separate "what were we thinking? / what did we do when this broke before?" page. === Pages with "wiki.phtml" subpages === *Anone has an idea why this is? I couldn't reproduce it with my local copy. --[[user:Magnus Manske|Magnus Manske]] ::I haven't seen any of these lately. I *think* it was fixed by using an absolute path for $THESCRIPT. --[[user:Brion VIBBER|Brion VIBBER]] === 130.94.122.xxx bug === *This is serious, because of its potential for masking vandalism. ::Possible fix submitted; I suspect that there's some kind of proxying going on at the server end. But I could be totally wrong. --[[user:Brion VIBBER|Brion VIBBER]] :::We'll see, it is in the mail now... --[[user:Magnus Manske|Magnus Manske]] ::::Fixed as of Feb 4 02. === Change password does not work === For more details, see [[Wikipedia:Bug Reports]]. Until this is fixed any users who change their password cannot log in (like me). --[[User:Chuck Smith]] :Strange. It works fine on my local copy. Tried to log in with your old password? --[[user:Magnus Manske|Magnus Manske]] ::Apparently fixed sometime before 2002/2/8. [[user:Brion VIBBER|Brion VIBBER]] [[Category:Wikipedia archives]]
Summary:
Please note that all contributions to Niidae Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Encyclopedia:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Templates used on this page:
Template:Archive
(
edit
)
Template:Variables
(
edit
)
Search
Search
Editing
Encyclopedia:PHP script tech talk
Add topic