Nytro Posted February 2, 2014 Report Posted February 2, 2014 [h=1]Portable Efficient Assembly Code-generator in Higher-level Python (PeachPy)[/h]PeachPy is a Python framework for writing high-performance assembly kernels. PeachPy is developed to simplify writing optimized assembly kernels while preserving all optimization opportunities of traditional assembly. Some PeachPy features: Automatic register allocationStack frame management, including re-aligning of stack frame as neededGenerating versions of a function for different calling conventions from the same source (e.g. functions for Microsoft x64 ABI and System V x86-64 ABI can be generated from the same source)Allows to define constants in the place where they are used (just like in high-level languages)Tracking of instruction extensions used in the function.Multiplexing of multiple instruction streams (helpful for software pipelining) [h=2]Installation[/h] PeachPy can be installed from PyPI pip install PeachPy from peachpy.x64 import *# Use 'x64-ms' for Microsoft x64 ABIabi = peachpy.c.ABI('x64-sysv')assembler = Assembler(abi)# Implement function void add_1(const uint32_t *src, uint32_t *dst, size_t length)src_argument = peachpy.c.Parameter("src", peachpy.c.Type("const uint32_t*"))dst_argument = peachpy.c.Parameter("dst", peachpy.c.Type("uint32_t*"))len_argument = peachpy.c.Parameter("length", peachpy.c.Type("size_t"))# This optimized kernel will target Intel Nehalem processors. Any instructions which are not# supported on Intel Nehalem (e.g. AVX instructions) will generate an error. If you don't have# a particular target in mind, use "Unknown"with Function(assembler, "add_1", (src_argument, dst_argument, len_argument), "Nehalem"): # Load arguments into registers srcPointer = GeneralPurposeRegister64() LOAD.PARAMETER( srcPointer, src_argument ) dstPointer = GeneralPurposeRegister64() LOAD.PARAMETER( dstPointer, dst_argument ) length = GeneralPurposeRegister64() LOAD.PARAMETER( length, len_argument ) # Main processing loop. Length must be a multiple of 4. LABEL( 'loop' ) x = SSERegister() MOVDQU( x, [srcPointer] ) ADD( srcPointer, 16 ) # Add 1 to x PADDD( x, Constant.uint32x4(1) ) MOVDQU( [dstPointer], x ) ADD( dstPointer, 16 ) SUB( length, 4 ) JNZ( 'loop' ) RETURN()print assemblerSursa: https://bitbucket.org/MDukhan/peachpy Quote