Byte-ul Posted August 9, 2014 Report Posted August 9, 2014 (edited) Overview of Handling Data with Best Practices in MindHandling user-supplied data is a big part of many web applications, and it's critical that this is done properly to prevent security holes. There are a number of best practices and principles we can follow when handling data, though I'll just be covering the ones I feel to be the most important:Practices:Treat all user data to be tainted until it has been validated; never assume the integrity of such data (guilty until proven innocent).Make your users apply by your validation rules. That is to say, do not attempt to correct any invalid data because this gives the potential for security vulnerabilities to arise.Keep track of data as it enters and exits parts of your application. This is critical in order to be able to tell what data is potentially tainted, and what data has been validated and is safe to use. Principles:Minimise exposure of sensitive data. This covers not storing passwords in cookies, not using the HTTP GET method as a way of requesting passwords, not storing configuration files in the document root, and so on.Defence in Depth - the advocation of using redundant safeguards. This can help to improve the security of a web application through having additional levels of safeguards in-place (that should never have to be used, but are there just in case).Filtering InputFiltering input should be done whenever applicable to prevent junk data from entering a web application. It is performed upon the data coming into an application where its validity is inspected. There's a number of ways we can filter our users' input, though the method you choose will be dependent upon the input data you're looking to manipulate. As such, I'll be running through just a few commonly used functions and libraries to give you more of an idea of how this inspection process works. I'll (try to) explicitly reference the practices and principles stated above when I use them.The Character Type Functions (ctype_)The character type functions are from the Ctype extension, which is full of handy functions that can be used to validate user input. It does this by checking the characters of a string to see if they're of an appropriate type, much like a simplistic regular expression. All of the Ctype functions are known as predicate functions because they only return a boolean value (TRUE or FALSE). Here's a list of the Ctype functions:Tip:Ensure that you're always passing in strings to these functions, even if the values are numeric. This is because the PHP manual states:"If an integer between -128 and 255 inclusive is provided, it is interpreted as the ASCII value of a single character (negative values have 256 added in order to allow characters in the Extended ASCII range). Any other integer is interpreted as a string containing the decimal digits of the integer."Further Reading:An Introduction to Ctype FunctionsPHP Manual - Ctypefilter_varThe filter_var function accepts three arguments: the variable to validate, the filter to apply (a constant), and any optional flags to be set on the filter used. Some simple scenarios where you're going to want to use this function is for validating URLs and E-mails. Validating them with regular expressions is not a good idea, even if you know your way around them.The function has two primary types of filtering: validation and sanitisation. Validation filtering will check for invalidity in the data, where FALSE is returned if data integrity is not met, and upon success the data is returned. Sanitisation filtering will attempt to replace any invalid characters and return the sanitised string (according to the filtering type used - this does not mean it is safe to exit it from your application without further sanitising).Here's a few simple and common use-cases:Validation filtering:<?php$email = 'valid@email.com';if(filter_var($email, FILTER_VALIDATE_EMAIL) !== FALSE) { // valid email}$url = 'http://domain.tld';if(filter_var($url, FILTER_VALIDATE_URL) !== FALSE) { // valid URL}$age = 20;$options = array('options' => array('min_range' => 18, 'max_range' => 100));if(filter_var($age, FILTER_VALIDATE_INT, $options) !== FALSE) { // valid age}(More examples on PHP.net)Sanitisation filtering:<?php$output = 'Protecting against XSS: <script>alert(0)</script>';echo filter_var($output, FILTER_SANITIZE_FULL_SPECIAL_CHARS);$int = 3.3;echo filter_var($int, FILTER_SANITIZE_NUMBER_INT); // 33// Note that it omits invalid characters, rather than truncating the input like other integer-validating functions(More examples on PHP.net)Further Reading:FiltersIt's all about TypeIt's a well-known fact that PHP is a loosely-typed language. Data types do not need to be explicitly stated before variable definitions or function parameters, and method signature types do not need to be specified either. But that's not to say variable type is not important though.Tip:It's always best practice to perform strict comparisons because of the loosely-typed nature of PHP.Type-CheckingType-checking in PHP can be done with the is_ functions - a set of predicate functions that return TRUE if the type is correct, or FALSE otherwise. The following is a list of these functions:There is also the instanceof operator, which will check that the left-hand operand object is of the same type as the right-hand operand object.Type-hintingSupport for type-hinting was first introduced in PHP 5, and has been a much-loved feature of the PHP community. Method parameters should take advantage of type hinting when possible because of the improved maintainability it provides, along with the less error-prone code it produces (that is also partially self-documenting). PHP supports the following types: objects, arrays (as of PHP 5.1), callables (as of PHP 5.4), and iterators. If a variable of the incorrect type is passed as an argument to a function, then a fatal error is produced.Type hints are used like so:<?phpclass TypeHinting{ private $carObject; private $accessories = array(); public function __construct(CarObjectInterface $carObject) { $this->carObject = $carObject; } public function addAccessories(array $accessories) { $this->accessories = $accessories; }}The above is a much cleaner and more legible snippet than the following (which does not take advantage of type-hinting):<?phpclass TypeHinting{* * private $carObject;* * private $accessories = array();* * public function __construct($carObject)* * {* * if(!($carObject instanceof CarObjectInterface)) {* * trigger_error('Fatal error: wrong type!', E_USER_ERROR);* * }* * $this->carObject = $carObject;* * }* * public function addAccessories($accessories)* * {* * if(!is_array($accessories)) {* * trigger_error('Fatal error: wrong type!', E_USER_ERROR);* * }* * $this->accessories = $accessories;* * }}PHP does not, however, support type-hinting for scalars (string, int, boolean, float), or the resource and trait types (and probably never will because of its loosely-typed nature). There are solutions to support scalars in the comments section on PHP.net, though beware that some may slow down the performance of your PHP applications.Type CastingWhen we perform a type cast operation in PHP, we change the variable type it is currently casted to. PHP supports the $var = (type) $var; syntax (similar to C and Java), where (type) can be any one of the following:Type casting is commonly done as a method of validation for integers from user input:<?phpif(isset($_GET['id'])) { $id = (int) $_GET['id']; // ensure that the id from the HTTP GET method is of an integer type}We can also use the settype() function to force a variable to a particular type.The Whitelist ApproachWhitelisting assumes that there will be a limited scope of validity in the data (such as an image uploader, where the file type is limited to that of images). We provide the only possibilities that the data can be, and anything else is discarded as invalid. This is commonly done with an array, where the in_array() function checks that a value exists within the array, and is therefore valid.<?php$languages = array('PHP', 'JavaScript', 'Ruby', 'Elixir');$inputLanguage = 'VB.net';if(in_array($inputLanguage, $languages, TRUE)) { // valid language}else{ // invalid language}We could also use an if/elseif/else or switch statement - though these are more commonly used for flow control logic with simple comparisons, rather than for whitelisting potential values.Tip:Always give the third argument to the in_array() function (as TRUE) to preform a strict comparison, unless absolutely necessary. Performing a strict comparison (equivalent to tri-operator comparison: ===, !==) of value- as well as type-checking is important to prevent strange things from happening (check out the "The Mystery of Value Appearance" section of this article).The opposite to the whitelist approach is to provide a blacklist of all unwanted values. This is done only if you know what possibilities aren't allowed, such as an IP address blacklist.Regular ExpressionsRegular expressions, or regex, are used for checking the format of input data and matching complex patterns. They should be used sparingly since they come at a cost of performance, but are a powerful and concise DSL (Domain-Specific Language) when used. They do require good knowledge of PCRE regex, and the patterns used should always be extensively tested before being deployed since their complexity can make it easy to slip-up.Due to the amount of content there is to cover when teaching regex, it will have to be done in another tutorial. But for now, if you'd like to check out how to use regular expressions, then I'd recommend the following websites:Introduction to PHP Regex (a step-by-step tutorial)Rexegg (for detailed information on regex)Regular Expressions (another tutorial-based website)Escaping OutputEscaping output to prevent interpretation of it is a method of preservation that is carried out upon data exiting an application. There are two primary exits of data from an application: to the browser as client-side code, and to the database inside queries.To the BrowserSee - Cross-Site Scripting for information on preservation of outputting special entities.To a DatabaseSee - Structured Query Language Injection for information on preservation of input values.Credits: http://www.hackforums.net/showthread.php?tid=4238146 Edited August 9, 2014 by Byte-ul Quote