Remove all urlencoding from data in the database
Remove all urlencoding from data in the database to enable more effective full-text searching, especially for non-roman languages.
Leave a comment
on 2010-12-11 18:04 *
By
Do we have a full list of tables with the respective columns that contain urlencoded strings ?
on 2010-12-12 02:31 *
By nick_ramsay
Not exactly, but we could do this in parts. The biggest reason for doing it is to enable full text searching of posts, so let's focus on the posts table first.
on 2010-12-15 23:04 *
By
Well it struck me as a good idea on how to handle this and I had to note it down.
What if, from a specific version and onwards (for example 1.5) we use proper non-url_encode'd data in the database and just don't provide an upgrade script. Instead we provide an option to transfer/migrate older version tables and permit the user to assign what
What if, from a specific version and onwards (for example 1.5) we use proper non-url_encode'd data in the database and just don't provide an upgrade script. Instead we provide an option to transfer/migrate older version tables and permit the user to assign what
old_installation_table
goes to new_installation_table
during that migration procedure we rewrite the content using urldecode php function. Have a look:<?php
// "Hi" in Greek
$hi = 'γειά';
// urlencoding the string
echo $hi_enc = urlencode($hi); // echoes: "%CE%B3%CE%B5%CE%B9%CE%AC"
// urldecoding the normal string("γειά") concatenated with the urlencoded("%CE%B3%CE%B5%CE%B9%CE%AC") string
echo $hi_dec = urldecode($hi.$hi_enc); // echoes: "γειάγειά"
on 2010-12-16 01:41 *
By nick_ramsay
We still have code in the core, and possibly some plugins that uses urldecode() on the data.
E.g. in Post.php:
$this->title = stripslashes(urldecode($post_row->post_title));
Are you saying that we leave this code as it is? I guess decoding an already decoded string returns the same value, right?
E.g. in Post.php:
$this->title = stripslashes(urldecode($post_row->post_title));
Are you saying that we leave this code as it is? I guess decoding an already decoded string returns the same value, right?
on 2010-12-16 08:19 *
By
I was suggesting this after we fix those parts in the code.
Yeap, decoding a decoded returns the same value.
Yeap, decoding a decoded returns the same value.
I have already removed the saving of data urlencoded in the database in my local working copy and it works as expected. Next step is to produce an upgrade script to go through the tables and save again the data in the tables in a urldecoded form.
Those tables are (those are what I have identified till now) :
Those tables are (those are what I have identified till now) :
- Table hotaru_categories:_ category_name, category_safename_
- Table hotaru_comments: comment_content
- Table hotaru_messaging: message_subject, message_content
- Table hotaru_plugins: plugin_authorurl
- Table hotaru_postmeta: postmeta_value
- Table hotaru_posts: post_title, post_orig_url, post_domain, post_tags, post_content
on 2010-12-19 22:36 *
By
(In revision:2322) [Branch 1.5] Adding the script that converts the data in the database to urlencoded (aka normal) form. see #196