0-Network Morphism

Network Morphism

网络态射 ICML 2016

背景

在深度学习中，模型架构的设计和优化是一个复杂且耗时的过程。现有的方法通常需要从头开始训练新的模型，或者进行复杂的超参数优化。论文提出了网络态射的概念，旨在通过调整现有网络结构，保持或提高网络性能，同时减少训练时间和资源。

实验方法

网络态射包括以下几种主要操作：

增加层数（Layer Morphism）：
- 在现有网络中增加新的层，并确保新的网络在初始化时的输出与原始网络保持一致。
- 通过将新层初始化为恒等映射，使得新层不会影响初始的输出。
数学表示：

$h’=f(W_x+b)$

其中，$f$ 是激活函数，$W$ 和 $b$ 是新层的权重和偏置，初始化为单位矩阵和零向量。
增加节点数（Node Morphism）：
- 增加网络中某一层的节点数，并保持输出不变。
- 通过复制和调整权重，使得新的节点不会影响初始的输出。
数学表示：

$W^′=[W,W_{new}]$

其中，$W_{\text{new}}$ 是新增加的节点权重，初始化为原始权重的拷贝。
改变激活函数（Activation Morphism）：
- 改变网络中的激活函数，同时确保网络的输出保持不变。
- 通过适当的初始化和调整，使新的激活函数与原始激活函数等效。

实验结果

MNIST

CIFAR10

通过在 CIFAR-10、CIFAR-100 和 ImageNet 等数据集上的实验，验证了网络变形方法的有效性。

实验结果表明，通过网络变形方法，可以在不影响性能的情况下，快速调整和优化现有网络结构。

与从头开始训练的新网络相比，网络变形方法显著减少了训练时间和计算资源。

主要贡献为：

提出了一种系统的方法，可以在不影响现有模型性能的情况下，变换神经网络的结构。
通过一系列的变换操作，如增加层数、增加节点数、改变激活函数等，来调整和优化网络。

结论

网络态射提供了一种高效的模型调整和优化方法，可以在保持现有网络性能的前提下，快速进行结构调整。这一方法具有重要的应用价值，特别是在需要频繁调整和优化模型结构的深度学习研究和应用中。未来的研究可以进一步探索更多类型的变形操作，以及在更复杂任务中的应用。

*代码分析

网络态射的一个比较火的开源应用是autokeras，其网络结构用 keras 的图模型 graph 表示。

graph 类中的每个节点都是层之间的中间张量，每一层都是图中的一条边。

graph 类中包含所有节点 (包括它们的 shape 和 id)、所有的层（包含层本身和它们的 id）、关系（层和输入节点、输出节点的关系以及邻接矩阵）。

# in autokeras/graph.py
def _build_network(self):
    self._node_to_id = {}
	# Recursively find all the interested nodes.
    for input_node in self.inputs:
        self._search_network(input_node, self.outputs, set(), set())
    self._nodes = sorted(
        list(self._node_to_id.keys()), key=lambda x: self._node_to_id[x]
    )

    for node in self.inputs + self.outputs:
        if node not in self._node_to_id:
            raise ValueError("Inputs and outputs not connected.")

    # Find the blocks.
    blocks = []
    for input_node in self._nodes:
        for block in input_node.out_blocks:
            if (
                any(
                    [
                        output_node in self._node_to_id
                        for output_node in block.outputs
                    ]
                )
                and block not in blocks
            ):
                blocks.append(block)

    # Check if all the inputs of the blocks are set as inputs.
    for block in blocks:
        for input_node in block.inputs:
            if input_node not in self._node_to_id:
                raise ValueError(
                    "A required input is missing for HyperModel "
                    "{name}.".format(name=block.name)
                )

    # Calculate the in degree of all the nodes
    in_degree = [0] * len(self._nodes)
    for node_id, node in enumerate(self._nodes):
        in_degree[node_id] = len(
            [block for block in node.in_blocks if block in blocks]
        )

    # Add the blocks in topological order.
    self.blocks = []
    self._block_to_id = {}
    while len(blocks) != 0:
        new_added = []

        # Collect blocks with in degree 0.
        for block in blocks:
            if any([in_degree[self._node_to_id[node]] for node in block.inputs]):
                continue
            new_added.append(block)

        # Remove the collected blocks from blocks.
        for block in new_added:
            blocks.remove(block)

        for block in new_added:
            # Add the collected blocks to the Graph.
            self._add_block(block)

            # Decrease the in degree of the output nodes.
            for output_node in block.outputs:
                output_node_id = self._node_to_id[output_node]
                in_degree[output_node_id] -= 1`

机器学习 > 论文笔记

0-HyKGE AHypothesis Knowledge Graph Enhanced Framework 上一篇

0-Net2Net: Accelerating Learning Via Knowledge Transfer 下一篇