mrkgnao: Great work, adalia. Found this only just now.
I began thinking of writing a forum search engine myself, but MaGog takes too much time already, so I'm glad you took it upon yourself.
From my experience, the most important part of a search engine is a name. Gog Search is.... hmmm... ok.
I was planning to call my forum search engine "DemaGogue" or "DemaGog". If you want, you can use it.
Your idea of scanning the whole forum once and then the deltas is the same direction I was thinking of going in, but:
1) You will need a very large server to hold all the posts (i.e. epxensive). You might want to restrict what you keep in the database (e.g. not all the text, but perhaps only the title, the text of the first post, the names and dates of posters).
2) You will quickly run across GOG's IP block for accessing too much info. You might want to discuss it with GOG before attempting the data collection, asking them to have your server's IP whitelisted, because you're bound to hit the limit very quickly. Note that I was unable to get them to whitelist MaGog, but I am not very good at asking favours.
Good luck.
Favourited, of course.
Thanks for the tips, I was going to message you eventually about some stuff. Although I can't remember what now... :/
I know one of the things was about links for the forums, as if I do go down the scanning route I'll need to get them from somewhere for all the sub forums... (although, that being said most of them have less than 20 pages so maybe the archive of posts would only be required for General Discussion and just let it search live for the game forums)
I hadn't even considered a name... I just made the original script for the mafia games, and as people found it useful and the forum search is awful I thought I would expand it. I'll certainly put some more thought into it.. ;)
Yeah, I was just thinking of dumping the whole lot into an SQL database... but I've never been great at working out memory usage and thinking about the size of some of the threads it would certainly need a lot of space, much more than my hosting package is likely to have. But if I don't store all the text then you can't search the whole of a forum / thread, which some what defeats the purpose... I'll see if I can think of a way to make it work.
I think I already may have hit the block while running tests. I'll try and speak to someone, but if they won't do it for you with MaGoG I doubt they would for me either