社区所有版块导航
Python
python开源   Django   Python   DjangoApp   pycharm  
DATA
docker   Elasticsearch  
aigc
aigc   chatgpt  
WEB开发
linux   MongoDB   Redis   DATABASE   NGINX   其他Web框架   web工具   zookeeper   tornado   NoSql   Bootstrap   js   peewee   Git   bottle   IE   MQ   Jquery  
机器学习
机器学习算法  
Python88.com
反馈   公告   社区推广  
产品
短视频  
印度
印度  
Py学习  »  DATABASE

在mysql中,如何在没有前导零的情况下排序数字

Cain Nuke • 3 年前 • 1871 次点击  

有没有一种优雅的方式可以在MySQL数据库中进行性能良好的自然排序?

例如,如果我有这个数据集:

  • 最终幻想
  • 最终幻想4
  • 最终幻想10
  • 最终幻想12
  • 最终幻想12:Promathia之链
  • 最终幻想冒险
  • 最终幻想起源
  • 最终幻想战术

还有别的吗 优雅的 解决方案不是将游戏名称拆分为组件

  • 标题 :“最终幻想”
  • 数字 : "12"
  • 字幕 :“Promathia之链”

以确保它们以正确的顺序出现?(4之后10,而不是2之前)。

这样做对a**来说是一种痛苦,因为时不时会有另一款游戏打破解析游戏名称的机制(例如《战锤40000》、《詹姆斯·邦德007》)

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/133189
 
1871 次点击  
文章 [ 22 ]  |  最新文章 3 年前
Frank Forte
Reply   •   1 楼
Frank Forte    4 年前

以下是一个简单的例子,如果标题的版本只是一个数字:

ORDER BY CAST(REGEXP_REPLACE(title, "[a-zA-Z]+", "") AS INT)';

否则,如果使用模式(此模式在版本之前使用#),则可以使用简单SQL:

create table titles(title);

insert into titles (title) values 
('Final Fantasy'),
('Final Fantasy #03'),
('Final Fantasy #11'),
('Final Fantasy #10'),
('Final Fantasy #2'),
('Bond 007 ##2'),
('Final Fantasy #01'),
('Bond 007'),
('Final Fantasy #11}');

select REGEXP_REPLACE(title, "#([0-9]+)", "\\1") as title from titles
ORDER BY REGEXP_REPLACE(title, "#[0-9]+", ""),
CAST(REGEXP_REPLACE(title, ".*#([0-9]+).*", "\\1") AS INT);     
+-------------------+
| title             |
+-------------------+
| Bond 007          |
| Bond 007 #2       |
| Final Fantasy     |
| Final Fantasy 01  |
| Final Fantasy 2   |
| Final Fantasy 03  |
| Final Fantasy 10  |
| Final Fantasy 11  |
| Final Fantasy 11} |
+-------------------+
8 rows in set, 2 warnings (0.001 sec)

如果需要,可以使用其他模式。 例如,如果你有一部电影《I'm#1》和《I'm#1第2部分》,那么可以将版本包装起来,例如《最终幻想》{11}

Peter V. Mørch
Reply   •   2 楼
Peter V. Mørch    14 年前

还有 natsort .它旨在成为 drupal plugin ,但它可以单独工作。

bonger
Reply   •   3 楼
bonger    10 年前

@plaix/Richard Toth/Luke Hoggett的最佳响应的一个简化的非udf版本是

SELECT name,
LEAST(
    IFNULL(NULLIF(LOCATE('0', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('1', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('2', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('3', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('4', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('5', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('6', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('7', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('8', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('9', name), 0), ~0)
) AS first_int
FROM table
ORDER BY IF(first_int = ~0, name, CONCAT(
    SUBSTR(name, 1, first_int - 1),
    LPAD(CAST(SUBSTR(name, first_int) AS UNSIGNED), LENGTH(~0), '0'),
    SUBSTR(name, first_int + LENGTH(CAST(SUBSTR(name, first_int) AS UNSIGNED)))
)) ASC
Bob Fanger
Reply   •   4 楼
Bob Fanger    14 年前

如果你使用的是PHP,你可以在PHP中进行自然排序。

$keys = array();
$values = array();
foreach ($results as $index => $row) {
   $key = $row['name'].'__'.$index; // Add the index to create an unique key.
   $keys[] = $key;
   $values[$key] = $row; 
}
natsort($keys);
$sortedValues = array(); 
foreach($keys as $index) {
  $sortedValues[] = $values[$index]; 
}

我希望MySQL在未来的版本中能够实现自然排序,但是 feature request (#1588) 从2003年开始营业,所以我不会屏住呼吸。

Cœur Bob
Reply   •   5 楼
Cœur Bob    6 年前

另一个选择是在从mysql中提取数据后在内存中进行排序。虽然从性能的角度来看,这不是最好的选择,但如果你没有对大量的列表进行排序,你应该会没事的。

如果你看一下Jeff的帖子,你会发现很多算法适用于你可能使用的语言。 Sorting for Humans : Natural Sort Order

Community KP.
Reply   •   6 楼
Community KP.    8 年前

关于Richard Toth的最佳回应 https://stackoverflow.com/a/12257917/4052357

注意UTF8编码的字符串,这些字符串包含2字节(或更多)字符和数字,例如。

12 南新宿

使用MySQL的 LENGTH() 在里面 udf_NaturalSortFormat 函数将返回字符串的字节长度,如果不正确,请使用 CHAR_LENGTH() 将返回正确的字符长度。

就我而言,使用 长度() 导致查询永远无法完成,导致MySQL的CPU使用率达到100%

DROP FUNCTION IF EXISTS `udf_NaturalSortFormat`;
DELIMITER ;;
CREATE FUNCTION `udf_NaturalSortFormat` (`instring` varchar(4000), `numberLength` int, `sameOrderChars` char(50)) 
RETURNS varchar(4000)
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    DECLARE sortString varchar(4000);
    DECLARE numStartIndex int;
    DECLARE numEndIndex int;
    DECLARE padLength int;
    DECLARE totalPadLength int;
    DECLARE i int;
    DECLARE sameOrderCharsLen int;

    SET totalPadLength = 0;
    SET instring = TRIM(instring);
    SET sortString = instring;
    SET numStartIndex = udf_FirstNumberPos(instring);
    SET numEndIndex = 0;
    SET i = 1;
    SET sameOrderCharsLen = CHAR_LENGTH(sameOrderChars);

    WHILE (i <= sameOrderCharsLen) DO
        SET sortString = REPLACE(sortString, SUBSTRING(sameOrderChars, i, 1), ' ');
        SET i = i + 1;
    END WHILE;

    WHILE (numStartIndex <> 0) DO
        SET numStartIndex = numStartIndex + numEndIndex;
        SET numEndIndex = numStartIndex;

        WHILE (udf_FirstNumberPos(SUBSTRING(instring, numEndIndex, 1)) = 1) DO
            SET numEndIndex = numEndIndex + 1;
        END WHILE;

        SET numEndIndex = numEndIndex - 1;

        SET padLength = numberLength - (numEndIndex + 1 - numStartIndex);

        IF padLength < 0 THEN
            SET padLength = 0;
        END IF;

        SET sortString = INSERT(sortString, numStartIndex + totalPadLength, 0, REPEAT('0', padLength));

        SET totalPadLength = totalPadLength + padLength;
        SET numStartIndex = udf_FirstNumberPos(RIGHT(instring, CHAR_LENGTH(instring) - numEndIndex));
    END WHILE;

    RETURN sortString;
END
;;

p、 我本想把这句话作为对原文的评论,但我的声誉还不够好

ConroyP
Reply   •   7 楼
ConroyP    16 年前

MySQL不允许这种“自然排序”,因此,获取所需内容的最佳方法似乎是按照上述方式拆分数据设置(单独的id字段等),否则,根据数据库中的非标题元素、索引元素(日期、数据库中插入的id等)执行排序。

让数据库为您进行排序几乎总是比将大数据集读入您选择的编程语言并在那里排序要快,因此,如果您对这里的数据库模式有任何控制,那么请考虑如上所述添加易于排序的字段,从长远来看,这将为您节省大量的麻烦和维护。

网站上不时会出现添加“自然排序”的请求 MySQL bugs discussion forums ,许多解决方案都围绕着剥离数据的特定部分并将其转换为 ORDER BY 查询的一部分,例如。

SELECT * FROM table ORDER BY CAST(mid(name, 6, LENGTH(c) -5) AS unsigned) 

这种解决方案可以用于上面的最终幻想示例,但不是特别灵活,也不太可能完全扩展到数据集,比如“战锤40000”和“詹姆斯·邦德007”。

Tarik
Reply   •   8 楼
Tarik    8 年前

我尝试过几种解决方案,但实际情况非常简单:

SELECT test_column FROM test_table ORDER BY LENGTH(test_column) DESC, test_column DESC

/* 
Result 
--------
value_1
value_2
value_3
value_4
value_5
value_6
value_7
value_8
value_9
value_10
value_11
value_12
value_13
value_14
value_15
...
*/
Antoine
Reply   •   9 楼
Antoine    11 年前

您还可以动态创建“排序列”:

SELECT name, (name = '-') boolDash, (name = '0') boolZero, (name+0 > 0) boolNum 
FROM table 
ORDER BY boolDash DESC, boolZero DESC, boolNum DESC, (name+0), name

这样,就可以创建组进行排序。

在我的查询中,我希望所有内容前面都有“-”,然后是数字,然后是文本。这可能会导致如下结果:

-
0    
1
2
3
4
5
10
13
19
99
102
Chair
Dog
Table
Windows

这样,您就不必在添加数据时以正确的顺序维护排序列。您还可以根据需要更改排序顺序。

user1467716
Reply   •   10 楼
user1467716    12 年前

我知道这个话题很古老,但我想我已经找到了一种方法:

SELECT * FROM `table` ORDER BY 
CONCAT(
  GREATEST(
    LOCATE('1', name),
    LOCATE('2', name),
    LOCATE('3', name),
    LOCATE('4', name),
    LOCATE('5', name),
    LOCATE('6', name),
    LOCATE('7', name),
    LOCATE('8', name),
    LOCATE('9', name)
   ),
   name
) ASC

废话,它对以下集合的排序不正确(这是无用的lol):

最终幻想1 最终幻想2 最终幻想5 最终幻想7 最终幻想7:降临儿童 最终幻想12 最终幻想112 FF1 FF2

Guma
Reply   •   11 楼
Guma    9 年前

订购:
0
1.
2.
10
23
101
205
1000
A.
aac
B
卡萨德萨酒店
css

使用此查询:

SELECT 
    column_name 
FROM 
    table_name 
ORDER BY
    column_name REGEXP '^\d*[^\da-z&\.\' \-\"\!\@\#\$\%\^\*\(\)\;\:\\,\?\/\~\`\|\_\-]' DESC, 
    column_name + 0, 
    column_name;
tye
Reply   •   12 楼
tye    16 年前

为“排序键”添加一个字段,该字段将所有数字字符串零填充为固定长度,然后在该字段上进行排序。

如果您可能有长串数字,另一种方法是为每个数字串预先设置数字数(固定宽度,零填充)。例如,如果一行中的数字不超过99位,那么对于“Super Blast 10 Ultra”,排序键将是“Super Blast 0210 Ultra”。

FilmJ
Reply   •   13 楼
FilmJ    13 年前

所以,虽然我知道你已经找到了一个令人满意的答案,但我在这个问题上挣扎了一段时间,我们之前已经确定在SQL中做得不太好,我们将不得不在JSON数组上使用javascript。

下面是我如何使用SQL解决这个问题的。希望这对其他人有帮助:

我有如下数据:

Scene 1
Scene 1A
Scene 1B
Scene 2A
Scene 3
...
Scene 101
Scene XXA1
Scene XXA2

事实上,我并没有“铸造”东西,尽管我认为这可能也有用。

我首先替换了数据中不变的部分,在本例中是“场景”,然后做了一个LPAD来排列。这似乎可以很好地让alpha字符串与编号字符串一样正确排序。

我的 ORDER BY 条款看起来像:

ORDER BY LPAD(REPLACE(`table`.`column`,'Scene ',''),10,'0')

显然,这无助于解决最初的问题,因为这个问题并不统一——但我想这可能适用于许多其他相关的问题,所以把它放在那里。

Neto Queiroz
Reply   •   14 楼
Neto Queiroz    8 年前

如果你不想重新发明轮子,或者对大量不起作用的代码感到头疼,就使用 Drupal Natural Sort ... 只要运行压缩后的SQL(MySQL或Postgre),就可以了。进行查询时,只需使用以下命令:

... ORDER BY natsort_canon(column_name, 'natural')
JeeBee
Reply   •   15 楼
JeeBee    13 年前
  1. 在表中添加排序键(秩)。 ORDER BY rank

  2. 使用“发布日期”栏。 ORDER BY release_date

  3. 从SQL中提取数据时,让对象进行排序,例如,如果提取到一个集合中,则将其设置为树集,并使数据模型实现可比性,并在此处执行自然排序算法(如果使用的是没有集合的语言,则插入排序就足够了),因为在创建模型并将其插入集合时,您将逐个读取SQL中的行)

plalx
Reply   •   16 楼
plalx    11 年前

我为你写了这个函数 MSSQL 2000 不久前:

/**
 * Returns a string formatted for natural sorting. This function is very useful when having to sort alpha-numeric strings.
 *
 * @author Alexandre Potvin Latreille (plalx)
 * @param {nvarchar(4000)} string The formatted string.
 * @param {int} numberLength The length each number should have (including padding). This should be the length of the longest number. Defaults to 10.
 * @param {char(50)} sameOrderChars A list of characters that should have the same order. Ex: '.-/'. Defaults to empty string.
 *
 * @return {nvarchar(4000)} A string for natural sorting.
 * Example of use: 
 * 
 *      SELECT Name FROM TableA ORDER BY Name
 *  TableA (unordered)              TableA (ordered)
 *  ------------                    ------------
 *  ID  Name                    ID  Name
 *  1.  A1.                 1.  A1-1.       
 *  2.  A1-1.                   2.  A1.
 *  3.  R1      -->         3.  R1
 *  4.  R11                 4.  R11
 *  5.  R2                  5.  R2
 *
 *  
 *  As we can see, humans would expect A1., A1-1., R1, R2, R11 but that's not how SQL is sorting it.
 *  We can use this function to fix this.
 *
 *      SELECT Name FROM TableA ORDER BY dbo.udf_NaturalSortFormat(Name, default, '.-')
 *  TableA (unordered)              TableA (ordered)
 *  ------------                    ------------
 *  ID  Name                    ID  Name
 *  1.  A1.                 1.  A1.     
 *  2.  A1-1.                   2.  A1-1.
 *  3.  R1      -->         3.  R1
 *  4.  R11                 4.  R2
 *  5.  R2                  5.  R11
 */
CREATE FUNCTION dbo.udf_NaturalSortFormat(
    @string nvarchar(4000),
    @numberLength int = 10,
    @sameOrderChars char(50) = ''
)
RETURNS varchar(4000)
AS
BEGIN
    DECLARE @sortString varchar(4000),
        @numStartIndex int,
        @numEndIndex int,
        @padLength int,
        @totalPadLength int,
        @i int,
        @sameOrderCharsLen int;

    SELECT 
        @totalPadLength = 0,
        @string = RTRIM(LTRIM(@string)),
        @sortString = @string,
        @numStartIndex = PATINDEX('%[0-9]%', @string),
        @numEndIndex = 0,
        @i = 1,
        @sameOrderCharsLen = LEN(@sameOrderChars);

    -- Replace all char that has to have the same order by a space.
    WHILE (@i <= @sameOrderCharsLen)
    BEGIN
        SET @sortString = REPLACE(@sortString, SUBSTRING(@sameOrderChars, @i, 1), ' ');
        SET @i = @i + 1;
    END

    -- Pad numbers with zeros.
    WHILE (@numStartIndex <> 0)
    BEGIN
        SET @numStartIndex = @numStartIndex + @numEndIndex;
        SET @numEndIndex = @numStartIndex;

        WHILE(PATINDEX('[0-9]', SUBSTRING(@string, @numEndIndex, 1)) = 1)
        BEGIN
            SET @numEndIndex = @numEndIndex + 1;
        END

        SET @numEndIndex = @numEndIndex - 1;

        SET @padLength = @numberLength - (@numEndIndex + 1 - @numStartIndex);

        IF @padLength < 0
        BEGIN
            SET @padLength = 0;
        END

        SET @sortString = STUFF(
            @sortString,
            @numStartIndex + @totalPadLength,
            0,
            REPLICATE('0', @padLength)
        );

        SET @totalPadLength = @totalPadLength + @padLength;
        SET @numStartIndex = PATINDEX('%[0-9]%', RIGHT(@string, LEN(@string) - @numEndIndex));
    END

    RETURN @sortString;
END

GO
Federico Razzoli
Reply   •   17 楼
Federico Razzoli    3 年前

其他答案是正确的,但您可能想知道MariaDB 10.7将有一个 natural_sort_key() 作用在撰写本文时,它仅作为预览版提供。对其功能进行了说明 here .

Michael Haren
Reply   •   18 楼
Michael Haren    16 年前

我想这就是为什么很多东西都是按发布日期排序的。

解决方案可能是在表中为“SortKey”创建另一列。这可能是标题的净化版本,符合您创建的模式,以便于排序或计数器。

Doin
Reply   •   19 楼
Doin    5 年前

我在这里看到的许多其他答案(以及重复的问题)基本上只适用于 非常 特定格式的数据,例如完全是数字的字符串,或有固定长度字母前缀的字符串。这在一般情况下是行不通的。

的确,在MySQL中并没有实现100%通用nat排序的方法,因为要做到这一点 真正地 需要是一个经过修改的概念 比较函数 ,它在字符串的字典排序和遇到数字时的数字排序之间切换。这样的代码可以实现任何你想要识别和比较两个字符串中数字部分的算法。不幸的是,MySQL中的比较函数是其代码内部的,用户无法更改。

这就留下了某种黑客,你试图创建一个 排序键 用于重新格式化数字部分的字符串,以便 标准的字典排序实际上是按照你想要的方式对它们进行排序 .

对于最大位数的纯整数,最明显的解决方案是简单地用零填充它们,这样它们的宽度都是固定的。这是Drupal插件采用的方法,以及@plalx/@RichardToth的解决方案。(@Christian有一个不同且复杂得多的解决方案,但它没有我所看到的优势)。

正如@tye所指出的,您可以通过为每个数字预先设置一个固定的数字长度来改进这个问题,而不是简单地将其填充。不过,即使考虑到本质上是一种笨拙的黑客行为的局限性,你仍有很多可以改进的地方。然而,似乎没有任何预先构建的解决方案!

例如,关于:

  • 正负号+10对10对-10
  • 小数?8.2, 8.5, 1.006, .75
  • 前导零?020, 030, 00000922
  • 一千个分离器?“1001斑点狗”对“1001斑点狗”
  • 版本号?MariaDB v10。3.18对MariaDB v10。3.3
  • 很长的数字?103,768,276,592,092,364,859,236,487,687,870,234,598.55

在@tye方法的基础上,我创建了一个相当紧凑的NatSortKey()存储函数,它可以将任意字符串转换为nat排序键,处理上述所有情况,效率相当高,并保留了总的排序顺序(没有两个不同的字符串具有相同的排序键)。第二个参数可用于限制每个字符串中处理的数字数量(例如,前10个数字),这可用于确保输出符合给定长度。

注意:使用此第二个参数的给定值生成的排序键字符串只能与使用 相同的 参数的值,否则它们可能无法正确排序!

你可以在订购时直接使用它,例如。

SELECT myString FROM myTable ORDER BY NatSortKey(myString,0);  ### 0 means process all numbers - resulting sort key might be quite long for certain inputs

但为了高效地对大型表进行排序,最好将排序键预先存储在另一列中(可能带有索引):

INSERT INTO myTable (myString,myStringNSK) VALUES (@theStringValue,NatSortKey(@theStringValue,10)), ...
...
SELECT myString FROM myTable ORDER BY myStringNSK;

[理想情况下,您可以通过以下方式自动实现这一点:将键列创建为计算存储列:

CREATE TABLE myTable (
...
myString varchar(100),
myStringNSK varchar(150) AS (NatSortKey(myString,10)) STORED,
...
KEY (myStringNSK),
...);

但现在 MySQL和MariaDB都不允许在计算列中存储函数 很不幸 你还不能这么做 .]


我的职能 只影响数字的排序 。如果要执行其他排序规范化操作,例如删除所有标点符号,或修剪每一端的空格,或用单个空格替换多个空格序列,则可以扩展该函数,也可以在之前或之后执行 NatSortKey() 应用于您的数据。(我建议使用 REGEXP_REPLACE() 为此目的)。

我认为这也有点以盎格鲁为中心对于小数点和“,”表示千位分隔符,但是如果您想要反转,或者想要将其作为一个参数进行切换,则应该很容易进行修改。

它可能需要在其他方面进一步改进;例如,它目前按绝对值对负数进行排序,因此-1在-2之前,而不是相反。在保留文本的ASC字典排序的同时,也无法为数字指定DESC排序顺序。这两个问题都可以通过更多的工作来解决;如果有时间,我会更新代码。

还有很多其他细节需要注意- 包括对正在使用的chaset和collation的一些关键依赖项 -但是我把它们都放进了SQL代码中的注释块中。 在自己使用该功能之前,请仔细阅读!

下面是代码。如果你发现一个错误,或者有一个我没有提到的改进,请在评论中告诉我!


delimiter $$
CREATE DEFINER=CURRENT_USER FUNCTION NatSortKey (s varchar(100), n int) RETURNS varchar(350) DETERMINISTIC
BEGIN
/****
  Converts numbers in the input string s into a format such that sorting results in a nat-sort.
  Numbers of up to 359 digits (before the decimal point, if one is present) are supported.  Sort results are undefined if the input string contains numbers longer than this.
  For n>0, only the first n numbers in the input string will be converted for nat-sort (so strings that differ only after the first n numbers will not nat-sort amongst themselves).
  Total sort-ordering is preserved, i.e. if s1!=s2, then NatSortKey(s1,n)!=NatSortKey(s2,n), for any given n.
  Numbers may contain ',' as a thousands separator, and '.' as a decimal point.  To reverse these (as appropriate for some European locales), the code would require modification.
  Numbers preceded by '+' sort with numbers not preceded with either a '+' or '-' sign.
  Negative numbers (preceded with '-') sort before positive numbers, but are sorted in order of ascending absolute value (so -7 sorts BEFORE -1001).
  Numbers with leading zeros sort after the same number with no (or fewer) leading zeros.
  Decimal-part-only numbers (like .75) are recognised, provided the decimal point is not immediately preceded by either another '.', or by a letter-type character.
  Numbers with thousand separators sort after the same number without them.
  Thousand separators are only recognised in numbers with no leading zeros that don't immediately follow a ',', and when they format the number correctly.
  (When not recognised as a thousand separator, a ',' will instead be treated as separating two distinct numbers).
  Version-number-like sequences consisting of 3 or more numbers separated by '.' are treated as distinct entities, and each component number will be nat-sorted.
  The entire entity will sort after any number beginning with the first component (so e.g. 10.2.1 sorts after both 10 and 10.995, but before 11)
  Note that The first number component in an entity like this is also permitted to contain thousand separators.

  To achieve this, numbers within the input string are prefixed and suffixed according to the following format:
  - The number is prefixed by a 2-digit base-36 number representing its length, excluding leading zeros.  If there is a decimal point, this length only includes the integer part of the number.
  - A 3-character suffix is appended after the number (after the decimals if present).
    - The first character is a space, or a '+' sign if the number was preceded by '+'.  Any preceding '+' sign is also removed from the front of the number.
    - This is followed by a 2-digit base-36 number that encodes the number of leading zeros and whether the number was expressed in comma-separated form (e.g. 1,000,000.25 vs 1000000.25)
    - The value of this 2-digit number is: (number of leading zeros)*2 + (1 if comma-separated, 0 otherwise)
  - For version number sequences, each component number has the prefix in front of it, and the separating dots are removed.
    Then there is a single suffix that consists of a ' ' or '+' character, followed by a pair base-36 digits for each number component in the sequence.

  e.g. here is how some simple sample strings get converted:
  'Foo055' --> 'Foo0255 02'
  'Absolute zero is around -273 centigrade' --> 'Absolute zero is around -03273 00 centigrade'
  'The $1,000,000 prize' --> 'The $071000000 01 prize'
  '+99.74 degrees' --> '0299.74+00 degrees'
  'I have 0 apples' --> 'I have 00 02 apples'
  '.5 is the same value as 0000.5000' --> '00.5 00 is the same value as 00.5000 08'
  'MariaDB v10.3.0018' --> 'MariaDB v02100130218 000004'

  The restriction to numbers of up to 359 digits comes from the fact that the first character of the base-36 prefix MUST be a decimal digit, and so the highest permitted prefix value is '9Z' or 359 decimal.
  The code could be modified to handle longer numbers by increasing the size of (both) the prefix and suffix.
  A higher base could also be used (by replacing CONV() with a custom function), provided that the collation you are using sorts the "digits" of the base in the correct order, starting with 0123456789.
  However, while the maximum number length may be increased this way, note that the technique this function uses is NOT applicable where strings may contain numbers of unlimited length.

  The function definition does not specify the charset or collation to be used for string-type parameters or variables:  The default database charset & collation at the time the function is defined will be used.
  This is to make the function code more portable.  However, there are some important restrictions:

  - Collation is important here only when comparing (or storing) the output value from this function, but it MUST order the characters " +0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ" in that order for the natural sort to work.
    This is true for most collations, but not all of them, e.g. in Lithuanian 'Y' comes before 'J' (according to Wikipedia).
    To adapt the function to work with such collations, replace CONV() in the function code with a custom function that emits "digits" above 9 that are characters ordered according to the collation in use.

  - For efficiency, the function code uses LENGTH() rather than CHAR_LENGTH() to measure the length of strings that consist only of digits 0-9, '.', and ',' characters.
    This works for any single-byte charset, as well as any charset that maps standard ASCII characters to single bytes (such as utf8 or utf8mb4).
    If using a charset that maps these characters to multiple bytes (such as, e.g. utf16 or utf32), you MUST replace all instances of LENGTH() in the function definition with CHAR_LENGTH()

  Length of the output:

  Each number converted adds 5 characters (2 prefix + 3 suffix) to the length of the string. n is the maximum count of numbers to convert;
  This parameter is provided as a means to limit the maximum output length (to input length + 5*n).
  If you do not require the total-ordering property, you could edit the code to use suffixes of 1 character (space or plus) only; this would reduce the maximum output length for any given n.
  Since a string of length L has at most ((L+1) DIV 2) individual numbers in it (every 2nd character a digit), for n<=0 the maximum output length is (inputlength + 5*((inputlength+1) DIV 2))
  So for the current input length of 100, the maximum output length is 350.
  If changing the input length, the output length must be modified according to the above formula.  The DECLARE statements for x,y,r, and suf must also be modified, as the code comments indicate.
****/
  DECLARE x,y varchar(100);            # need to be same length as input s
  DECLARE r varchar(350) DEFAULT '';   # return value:  needs to be same length as return type
  DECLARE suf varchar(101);   # suffix for a number or version string. Must be (((inputlength+1) DIV 2)*2 + 1) chars to support version strings (e.g. '1.2.33.5'), though it's usually just 3 chars. (Max version string e.g. 1.2. ... .5 has ((length of input + 1) DIV 2) numeric components)
  DECLARE i,j,k int UNSIGNED;
  IF n<=0 THEN SET n := -1; END IF;   # n<=0 means "process all numbers"
  LOOP
    SET i := REGEXP_INSTR(s,'\\d');   # find position of next digit
    IF i=0 OR n=0 THEN RETURN CONCAT(r,s); END IF;   # no more numbers to process -> we're done
    SET n := n-1, suf := ' ';
    IF i>1 THEN
      IF SUBSTRING(s,i-1,1)='.' AND (i=2 OR SUBSTRING(s,i-2,1) RLIKE '[^.\\p{L}\\p{N}\\p{M}\\x{608}\\x{200C}\\x{200D}\\x{2100}-\\x{214F}\\x{24B6}-\\x{24E9}\\x{1F130}-\\x{1F149}\\x{1F150}-\\x{1F169}\\x{1F170}-\\x{1F189}]') AND (SUBSTRING(s,i) NOT RLIKE '^\\d++\\.\\d') THEN SET i:=i-1; END IF;   # Allow decimal number (but not version string) to begin with a '.', provided preceding char is neither another '.', nor a member of the unicode character classes: "Alphabetic", "Letter", "Block=Letterlike Symbols" "Number", "Mark", "Join_Control"
      IF i>1 AND SUBSTRING(s,i-1,1)='+' THEN SET suf := '+', j := i-1; ELSE SET j := i; END IF;   # move any preceding '+' into the suffix, so equal numbers with and without preceding "+" signs sort together
      SET r := CONCAT(r,SUBSTRING(s,1,j-1)); SET s = SUBSTRING(s,i);   # add everything before the number to r and strip it from the start of s; preceding '+' is dropped (not included in either r or s)
    END IF;
    SET x := REGEXP_SUBSTR(s,IF(SUBSTRING(s,1,1) IN ('0','.') OR (SUBSTRING(r,-1)=',' AND suf=' '),'^\\d*+(?:\\.\\d++)*','^(?:[1-9]\\d{0,2}(?:,\\d{3}(?!\\d))++|\\d++)(?:\\.\\d++)*+'));   # capture the number + following decimals (including multiple consecutive '.<digits>' sequences)
    SET s := SUBSTRING(s,LENGTH(x)+1);   # NOTE: LENGTH() can be safely used instead of CHAR_LENGTH() here & below PROVIDED we're using a charset that represents digits, ',' and '.' characters using single bytes (e.g. latin1, utf8)
    SET i := INSTR(x,'.');
    IF i=0 THEN SET y := ''; ELSE SET y := SUBSTRING(x,i); SET x := SUBSTRING(x,1,i-1); END IF;   # move any following decimals into y
    SET i := LENGTH(x);
    SET x := REPLACE(x,',','');
    SET j := LENGTH(x);
    SET x := TRIM(LEADING '0' FROM x);   # strip leading zeros
    SET k := LENGTH(x);
    SET suf := CONCAT(suf,LPAD(CONV(LEAST((j-k)*2,1294) + IF(i=j,0,1),10,36),2,'0'));   # (j-k)*2 + IF(i=j,0,1) = (count of leading zeros)*2 + (1 if there are thousands-separators, 0 otherwise)  Note the first term is bounded to <= base-36 'ZY' as it must fit within 2 characters
    SET i := LOCATE('.',y,2);
    IF i=0 THEN
      SET r := CONCAT(r,LPAD(CONV(LEAST(k,359),10,36),2,'0'),x,y,suf);   # k = count of digits in number, bounded to be <= '9Z' base-36
    ELSE   # encode a version number (like 3.12.707, etc)
      SET r := CONCAT(r,LPAD(CONV(LEAST(k,359),10,36),2,'0'),x);   # k = count of digits in number, bounded to be <= '9Z' base-36
      WHILE LENGTH(y)>0 AND n!=0 DO
        IF i=0 THEN SET x := SUBSTRING(y,2); SET y := ''; ELSE SET x := SUBSTRING(y,2,i-2); SET y := SUBSTRING(y,i); SET i := LOCATE('.',y,2); END IF;
        SET j := LENGTH(x);
        SET x := TRIM(LEADING '0' FROM x);   # strip leading zeros
        SET k := LENGTH(x);
        SET r := CONCAT(r,LPAD(CONV(LEAST(k,359),10,36),2,'0'),x);   # k = count of digits in number, bounded to be <= '9Z' base-36
        SET suf := CONCAT(suf,LPAD(CONV(LEAST((j-k)*2,1294),10,36),2,'0'));   # (j-k)*2 = (count of leading zeros)*2, bounded to fit within 2 base-36 digits
        SET n := n-1;
      END WHILE;
      SET r := CONCAT(r,y,suf);
    END IF;
  END LOOP;
END
$$
delimiter ;
Richard Toth
Reply   •   20 楼
Richard Toth    8 年前

与@plalx发布的功能相同,但已重写为MySQL:

DROP FUNCTION IF EXISTS `udf_FirstNumberPos`;
DELIMITER ;;
CREATE FUNCTION `udf_FirstNumberPos` (`instring` varchar(4000)) 
RETURNS int
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    DECLARE position int;
    DECLARE tmp_position int;
    SET position = 5000;
    SET tmp_position = LOCATE('0', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF; 
    SET tmp_position = LOCATE('1', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('2', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('3', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('4', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('5', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('6', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('7', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('8', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('9', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;

    IF (position = 5000) THEN RETURN 0; END IF;
    RETURN position;
END
;;

DROP FUNCTION IF EXISTS `udf_NaturalSortFormat`;
DELIMITER ;;
CREATE FUNCTION `udf_NaturalSortFormat` (`instring` varchar(4000), `numberLength` int, `sameOrderChars` char(50)) 
RETURNS varchar(4000)
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    DECLARE sortString varchar(4000);
    DECLARE numStartIndex int;
    DECLARE numEndIndex int;
    DECLARE padLength int;
    DECLARE totalPadLength int;
    DECLARE i int;
    DECLARE sameOrderCharsLen int;

    SET totalPadLength = 0;
    SET instring = TRIM(instring);
    SET sortString = instring;
    SET numStartIndex = udf_FirstNumberPos(instring);
    SET numEndIndex = 0;
    SET i = 1;
    SET sameOrderCharsLen = CHAR_LENGTH(sameOrderChars);

    WHILE (i <= sameOrderCharsLen) DO
        SET sortString = REPLACE(sortString, SUBSTRING(sameOrderChars, i, 1), ' ');
        SET i = i + 1;
    END WHILE;

    WHILE (numStartIndex <> 0) DO
        SET numStartIndex = numStartIndex + numEndIndex;
        SET numEndIndex = numStartIndex;

        WHILE (udf_FirstNumberPos(SUBSTRING(instring, numEndIndex, 1)) = 1) DO
            SET numEndIndex = numEndIndex + 1;
        END WHILE;

        SET numEndIndex = numEndIndex - 1;

        SET padLength = numberLength - (numEndIndex + 1 - numStartIndex);

        IF padLength < 0 THEN
            SET padLength = 0;
        END IF;

        SET sortString = INSERT(sortString, numStartIndex + totalPadLength, 0, REPEAT('0', padLength));

        SET totalPadLength = totalPadLength + padLength;
        SET numStartIndex = udf_FirstNumberPos(RIGHT(instring, CHAR_LENGTH(instring) - numEndIndex));
    END WHILE;

    RETURN sortString;
END
;;

用法:

SELECT name FROM products ORDER BY udf_NaturalSortFormat(name, 10, ".")