Jump to content
Nytro

Portable Efficient Assembly Code-generator in Higher-level Python (PeachPy)

Recommended Posts

Posted

[h=1]Portable Efficient Assembly Code-generator in Higher-level Python (PeachPy)[/h]

PeachPy is a Python framework for writing high-performance assembly kernels. PeachPy is developed to simplify writing optimized assembly kernels while preserving all optimization opportunities of traditional assembly. Some PeachPy features:

  • Automatic register allocation

  • Stack frame management, including re-aligning of stack frame as needed

  • Generating versions of a function for different calling conventions from the same source (e.g. functions for Microsoft x64 ABI and System V x86-64 ABI can be generated from the same source)

  • Allows to define constants in the place where they are used (just like in high-level languages)

  • Tracking of instruction extensions used in the function.

  • Multiplexing of multiple instruction streams (helpful for software pipelining)

[h=2]Installation[/h] PeachPy can be installed from PyPI

pip install PeachPy

from peachpy.x64 import *

# Use 'x64-ms' for Microsoft x64 ABI

abi = peachpy.c.ABI('x64-sysv')

assembler = Assembler(abi)

# Implement function void add_1(const uint32_t *src, uint32_t *dst, size_t length)

src_argument = peachpy.c.Parameter("src", peachpy.c.Type("const uint32_t*"))

dst_argument = peachpy.c.Parameter("dst", peachpy.c.Type("uint32_t*"))

len_argument = peachpy.c.Parameter("length", peachpy.c.Type("size_t"))

# This optimized kernel will target Intel Nehalem processors. Any instructions which are not

# supported on Intel Nehalem (e.g. AVX instructions) will generate an error. If you don't have

# a particular target in mind, use "Unknown"

with Function(assembler, "add_1", (src_argument, dst_argument, len_argument), "Nehalem"):

# Load arguments into registers

srcPointer = GeneralPurposeRegister64()

LOAD.PARAMETER( srcPointer, src_argument )

dstPointer = GeneralPurposeRegister64()

LOAD.PARAMETER( dstPointer, dst_argument )

length = GeneralPurposeRegister64()

LOAD.PARAMETER( length, len_argument )

# Main processing loop. Length must be a multiple of 4.

LABEL( 'loop' )

x = SSERegister()

MOVDQU( x, [srcPointer] )

ADD( srcPointer, 16 )

# Add 1 to x

PADDD( x, Constant.uint32x4(1) )

MOVDQU( [dstPointer], x )

ADD( dstPointer, 16 )

SUB( length, 4 )

JNZ( 'loop' )

RETURN()

print assembler

Sursa: https://bitbucket.org/MDukhan/peachpy

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.



×
×
  • Create New...