AMD unveils powerful 'stream computing' chip

AMD's upcoming FireStream processor might be a way for scientists to tap into a lot of performance without breaking the bank.

The company will be demonstrating its FireStream 9170 processor next week at the SC07 supercomputing show, and executives spoke this week about the promise of "stream computing." The 9170 is designed to let high-performance computing applications take advantage of the excellent parallel performance of a graphics chip.

The big trend in chip design over the past few years has been parallelism. Instead of trying to crunch all the data through a single path moving as fast as possible, the cool kids are now adding paths so data can flow down multiple outlets. This allows the chip to run at slower speeds, and therefore cooler temperatures.

Graphics processing units (GPUs) have been doing this for years. The high-performance discrete graphics chips from companies like Nvidia and AMD's ATI division have been designed with parallel performance in mind for a very long time. Certain types of customers in labs and research facilities would love to be able to tap into that kind of processing power, but GPUs require special programming techniques.

AMD is trying to bridge the gap between PC processors that are easy to program and graphics chips that offer great performance with the FireStream 9170. Think of it as a high-end graphics chip with a lot more memory than usually ships with those products, said Robert Feldstein, vice president of engineering for AMD.

The performance will be there. The 9170 is essentially one of ATI's high-end discrete graphics chips that has been tricked out with more memory and double-precision floating point units, which apparently is better than single precision. It comes with 2GBs of memory, compared with 512MBs of memory on the most powerful ATI graphics chip.

But the programming is still a little tricky. You'll need a software developer's kit, and you'll probably only want to port limited amounts of your code to run on the 9170.

"You don't have a researcher that's trying to port over thousands of lines of legacy code. They have a particular algorithm that (the researcher) knows will run well on a GPU," said Patricia Harrell, director of stream computing for AMD. "You're not worried about changing code for something that gives you an order of magnitude increase (in performance)," she said.

The 9170 isn't going to be out until the first quarter of next year, as AMD's graphics priorities for the holiday season are discrete graphics chips for PCs that all of us can use. It will cost $1,999, which might seem like a lot, but this is something you should be able to add into an existing workstation or server for a performance boost when you need it, rather than buying a fancy server for just a few lines of code.

Eventually, AMD wants to integrate this type of technology directly onto a PC or server processor. It has already announced plans to integrate graphics chips onto PC chips as part of its Fusion project, but it hasn't identified a timeframe for putting its powerful stream computing technology on a PC chip.

Loading mentions Retweet
Filed under  //  stream computing  
Comments (0)
Posted 3 months ago

IBM tackles high-volume 'stream computing'

IBM launched this week System S, a software platform built following five years of research into the real-time analysis of large amounts of unstructured business or scientific data.

IBM calls the resulting technology "stream computing," because the software deals with streams of data.

Also this week, IBM opened the IBM European Stream Computing Center, headquartered in Dublin. The center will serve as a hub of research, customer support, and advanced testing for stream-computing applications.

 

 

System S is IBM's answer to the growing problem of data overload, the company said. In particular, it is a response to the growing amount of unstructured data--such as Web pages, e-mails, blogs, video and data captured from electronic sensors--that organizations are faced with processing.

The new IBM software is designed specifically to handle such information, as well as the structured data found in databases. It processes this data in real time, giving users the ability to make decisions based on that analysis right away, according to IBM.

"Traditional computing models retrospectively analyze stored data and cannot continuously process massive amounts of incoming data streams that affect critical decision-making," IBM said in a statement. "System S is designed to help clients become more 'real-world aware', seeing and responding to changes across complex systems."

The software is written using a programming language specifically developed for stream computing, called SPADE (stream processing application declarative engine). It is designed to run on a variety of hardware platforms, including clusters, multicore architectures and chips such as the Cell processor, IBM said.

The system can be used to analyze data such as stock prices, retail sales, and weather reports. IBM is aiming it at financial institutions, government and law enforcement agencies, and retailers, among other organizations.

System S is being used in a number of pilot projects that demonstrate the diverse types of applications IBM is targeting for the software.

Uppsala University and the Swedish Institute of Space Physics, for instance, are using a pilot system to analyze the way radio emissions from space affect energy transmission over power lines, communications via radio and TV signals, and airline and space travel, IBM said.

The Marine Institute of Ireland is using the system to monitor large volumes of underwater acoustic information, while TD Securities is using the software to develop a pilot of an automated options-trading system. A pilot at the University of Ontario Institute of Technology is using System S to monitor streams of biomedical data from critically ill premature babies.

The software is currently available in English directly from IBM, with prices ranging from $100,000 for a two-server installation up to several million dollars for a large cluster with hundreds of nodes, IBM said.

System S is scheduled to be released in multiple languages and through IBM business partners sometime in 2010.

Loading mentions Retweet
Filed under  //  stream computing  
Comments (0)
Posted 3 months ago

Stream Computing on Graphics Hardware

 BY:Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan
Computer Science Department
Stanford University

To appear at SIGGRAPH 2004

Abstract

In this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming coprocessor. We present a compiler and runtime system that abstracts and virtualizes many aspects of graphics hardware. In addition, we present an analysis of the effectiveness of the GPU as a compute engine compared to the CPU, to determine when the GPU can outperform the CPU for a particular algorithm. We evaluate our system with five applications, the SAXPY and SGEMV BLAS operators, image segmentation, FFT, and ray tracing. For these applications, we demonstrate that our Brook implementations perform comparably to hand-written GPU code and up to seven times faster than their CPU counterparts.

Paper http://graphics.stanford.edu/papers/brookgpu/brookgpu.pdf

Presentation http://graphics.stanford.edu/papers/brookgpu/buck.Brook.pdf

 

 

Loading mentions Retweet
Filed under  //  HIVE   research   stream computing  
Comments (0)
Posted 3 months ago

HiDISC - Hierarchical Decoupled Instruction Stream Computer

HiDISC is a hierarchical decoupled instruction stream computer that has a compute processor, an access processer (which feeds operands from the cache to the compute processor), and a cache management processor (which can prefetch data from memory into the cache). HiDISC is designed to address the problem of increasing memory latency while taking advantage of instruction-level parallelism.

via:http://www.east.isi.edu/~crago/hidisc/

Loading mentions Retweet
Filed under  //  stream computing  
Comments (0)
Posted 4 months ago

A Library of Streams

 

Title

A Library of Streams

Author

Philip L. Bewig

Status

This SRFI is currently in ``final'' status. It is also deprecated and superseded by SRFI 41. To see an explanation of each status that a SRFI can hold, see here. To comment on this SRFI, please mail to srfi minus 40 at srfi dot schemers dot org. See instructions here to subscribe to the list. You can access the discussion via the archive of the mailing list. You can access post-finalization messages via the archive of the mailing list.

  • Received: 2003/02/03
  • Draft: 2003/02/03-2003/04/03
  • Revised: 2003/08/02
  • Revised: 2003/12/23
  • Final: 2004/08/22

    Abstract

    Along with higher-order functions, one of the hallmarks of functional programming is lazy evaluation. A primary manifestation of lazy evaluation is lazy lists, generally called streams by Scheme programmers, where evaluation of a list element is delayed until its value is needed.

    The literature on lazy evaluation distinguishes two styles of laziness, called even and odd. Odd style streams are ubiquitous among Scheme programs and can be easily encoded with the Scheme primitives delay and force defined in R5RS. However, the even style delays evaluation in a manner closer to that of traditional lazy languages such as Haskell and avoids an "off by one" error that is symptomatic of the odd style.

    This SRFI defines the stream data type in the even style, some essential procedures and syntax that operate on streams, and motivates our choice of the even style. A companion SRFI 41 Stream Library provides additional procedures and syntax which make for more convenient processing of streams and shows several examples of their use.

    Rationale

    Two of the defining characteristics of functional programming languages are higher-order functions, which provide a powerful tool to allow programmers to abstract data representations away from an underlying concrete implementation, and lazy evaluation, which allows programmers to modularize a program and recombine the pieces in useful ways. Scheme provides higher-order functions through its lambda keyword and lazy evaluation through its delay keyword. A primary manifestation of lazy evaluation is lazy lists, generally called streams by Scheme programmers, where evaluation of a list element is delayed until its value is needed. Streams can be used, among other things, to compute with infinities, conveniently process simulations, program with coroutines, and reduce the number of passes over data. This library defines a minimal set of functions and syntax for programming with streams.

    Scheme has a long tradition of computing with streams. The great computer science textbook Structure and Interpretation of Computer Programs, uses streams extensively. The example given in R5RS makes use of streams to integrate systems of differential equations using the method of Runge-Kutta. MIT Scheme, the original implementation of Scheme, provides streams natively. Scheme and the Art of Programming, discusses streams. Some Scheme-like languages also have traditions of using streams: Winston and Horn, in their classic Lisp textbook, discuss streams, and so does Larry Paulson in his text on ML. Streams are an important and useful data structure.

    Basically, a stream is much like a list, and can either be null or can consist of an object (the stream element) followed by another stream; the difference to a list is that elements aren't evaluated until they are accessed. All the streams mentioned above use the same underlying representation, with the null stream represented by '()and stream pairs constructed by (cons car (delay cdr)), which must be implemented as syntax. These streams are known as head-strict, because the head of the stream is always computed, whether or not it is needed.

    Streams are the central data type -- just as arrays are for most imperative languages and lists are for Lisp and Scheme -- for the "pure" functional languages Miranda and Haskell. But those streams are subtly different from the traditional Scheme streams of SICP et al. The difference is at the head of the stream, where Miranda and Haskell provide streams that are fully lazy, with even the head of the stream not computed until it is needed. We'll see in a moment the operational difference between the two types of streams.

    Philip Wadler, Walid Taha, and David MacQueen, in their paper "How to add laziness to a strict language without even being odd", describe how they added streams to the SML/NJ compiler. They discuss two kinds of streams: odd streams, as in SICP et al, and even streams, as in Haskell; the names odd and even refer to the parity of the number of constructors (delayconsnil) used to represent the stream. Here are the first two figures from their paper, rewritten in Scheme:

    ;;; FIGURE 1 -- ODD                 
    				    
    (define nil1 '())                   
    				    
    (define (nil1? strm)                
      (null? strm))                     
    				    
    (define-syntax cons1                
      (syntax-rules ()                  
        ((cons1 obj strm)               
          (cons obj (delay strm)))))    
    				    
    (define (car1 strm)                 
      (car strm))                       
    				    
    (define (cdr1 strm)                 
      (force (cdr strm)))               
    				    
    (define (map1 func strm)            
                                        
      (if (nil1? strm)                  
        nil1                            
        (cons1                          
          (func (car1 strm))            
          (map1 func (cdr1 strm)))))    
    				    
    (define (countdown1 n)              
                                        
      (cons1 n (countdown1 (- n 1))))   
    				    
    (define (cutoff1 n strm)            
      (cond                             
        ((zero? n) '())                 
        ((nil1? strm) '())              
        (else                           
          (cons                         
            (car1 strm)                 
            (cutoff1 (- n 1)             
                     (cdr1 strm))))))    
    
    ;;; FIGURE 2 -- EVEN
    
    (define nil2 (delay '()))
    
    (define (nil2? strm)
      (null? (force strm)))
    
    (define-syntax cons2
      (syntax-rules ()
        ((cons2 obj strm)
         (delay (cons obj strm)))))
    
    (define (car2 strm)
      (car (force strm)))
    
    (define (cdr2 strm)
      (cdr (force strm)))
    
    (define (map2 func strm)
      (delay (force
        (if (nil2? strm)
          nil2
          (cons2
            (func (car2 strm))
            (map2 func (cdr2 strm)))))))
    
    (define (countdown2 n)
      (delay (force
        (cons2 n (countdown2 (- n 1))))))
    
    (define (cutoff2 n strm)
      (cond
        ((zero? n) '())
        ((nil2? strm) '())
        (else
          (cons
            (car2 strm)
            (cutoff2 (- n 1)
                     (cdr2 strm))))))
    

    It is easy to see the operational difference between the two kinds of streams, using an example adapted from the paper:

    > (define (12div n) (/ 12 n))       
    > (cutoff1 4                        
        (map1 12div (countdown1 4)))    
    error: divide by zero               
    
    > (define (12div n) (/ 12 n))
    > (cutoff2 4
        (map2 12div (countdown2 4)))
    (3 4 6 12)
    

    The problem of odd streams is that they do too much work, having an "off-by-one" error that causes them to evaluate the next element of a stream before it is needed. Mostly that's just a minor leak of space and time, but if evaluating the next element causes an error, such as dividing by zero, it's a silly, unnecessary bug.

    It is instructive to look at the coding differences between odd and even streams. We expect the two constructors nil and cons to be different, and they are; the oddnil and cons return a strict list, but the even nil and cons return promises. Nil?car and cdr change to accomodate the underlying representation differences.Cutoff is identical in the two versions, because it doesn't return a stream.

    The subtle but critical difference is in map and countdown, the two functions that return streams. They are identical except for the (delay (force ...)) that wraps the return value in the even version. That looks odd, but is correct. It is tempting to just eliminate the (delay (force ...)), but that doesn't work, because, given a promise x, even though (delay (force x)) and x both evaluate to x when forced, their semantics are different, with x being evaluated and cached in one case but not the other. That evaluation is, of course, the same "off-by-one" error that caused the problem with odd streams. Note that (force (delay x)) is something different entirely, even though it looks much the same.

    Unfortunately, that (delay (force ...)) is a major notational inconvenience, because it means that the representation of streams can't be hidden inside a few primitives but must infect each function that returns a stream, making streams harder to use, harder to explain, and more prone to error. Wadler et al solve the notational inconvenience in their SML/NJ implementation by adding special syntax -- the keyword lazy -- within the compiler. Since Scheme allows syntax to be added via a macro, it doesn't require any compiler modifications to provide streams. Shown below is a Scheme implementation of Figure 1 to 3 from the paper, with the (delay (force ...)) hidden within stream-define, which is the syntax used to create a function that returns a stream:

    ;;; FIGURE 1 -- ODD      
    			 
    (define nil1             
      '())                   
    			 
    (define (nil1? strm)     
      (null? strm))          
    			 
    (define-syntax cons1     
      (syntax-rules ()       
        ((cons1 obj strm)    
          (cons              
            obj              
              (delay         
                strm)))))    
    			 
    (define (car1 strm)      
      (car strm))            
    			 
    (define (cdr1 strm)      
      (force (cdr strm)))    
    			 
                             
                             
                             
                             
                             
                             
                             
    			 
    (define (map1 func strm) 
                             
      (if (nil1? strm)       
        nil1                 
        (cons1               
          (func              
            (car1 strm))     
          (map1              
            func             
            (cdr1            
              strm)))))      
    			 
    (define (countdown1 n)   
                             
      (cons1                 
        n                    
        (countdown1          
          (- n 1))))         
    			 
    (define (cutoff1 n strm) 
      (cond                  
        ((zero? n) '())      
        ((nil1? strm) '())   
        (else                
          (cons              
            (car1 strm)      
            (cutoff1         
              (- n 1)        
              (cdr1          
                strm))))))   
    
    ;;; FIGURE 2 -- EVEN     
    			 
    (define nil2             
      (delay '()))           
    			 
    (define (nil2? strm)     
      (null? (force strm))   
    			 
    (define-syntax cons2     
      (syntax-rules ()       
        ((cons2 obj strm)    
          (delay             
            (cons            
              obj            
              strm)))))      
    			 
    (define (car2 strm)      
      (car (force strm)))    
    			 
    (define (cdr2 strm)      
      (cdr (force strm)))    
    			 
                             
                             
                             
                             
                             
                             
                             
    			 
    (define (map2 func strm) 
      (delay (force		 
        (if (nil2? strm)     
          nil2               
          (cons2             
            (func            
              (car2 strm))   
            (map2            
              func           
              (cdr2          
                strm)))))))  
    			 
    (define (countdown2 n)   
      (delay (force		 
        (cons2               
          n                  
          (countdown2        
            (- n 1))))))     
    			 
    (define (cutoff2 n strm) 
      (cond                  
        ((zero? n) '())      
        ((nil2? strm) '())   
        (else                
          (cons              
            (car2 strm)      
            (cutoff2         
              (- n 1)        
              (cdr2          
                strm))))))   
    
    ;;; FIGURE 3 -- EASY
    
    (define nil3
      (delay '()))
    
    (define (nil3? strm)
      (null? (force strm)))
    
    (define-syntax cons3
      (syntax-rules ()
        ((cons3 obj strm)
          (delay
            (cons
              obj
              strm)))))
    
    (define (car3 strm)
      (car (force strm)))
    
    (define (cdr3 strm)
      (cdr (force strm)))
    
    (define-syntax stream-define
     (syntax-rules ()
      ((stream-define (name args ...)
                      body0 body1 ...)
       (define (name args ...)
        (delay (force
         (begin body0 body1 ...)))))))
    
    (stream-define (map3 func strm)
    
      (if (nil3? strm)
        nil3
        (cons3
          (func
            (car3 strm))
          (map3
            func
            (cdr3
              strm)))))
    
    (stream-define (countdown3 n)
    
      (cons3
        n
        (countdown3
          (- n 1))))
    
    (define (cutoff3 n strm)
      (cond
        ((zero? n) '())
        ((nil3? strm) '())
        (else
          (cons
            (car3 strm)
            (cutoff3
              (- n 1)
              (cdr3
                strm))))))
    

    It is now easy to see the notational inconvenience of Figure 2, as the bodies of map1 and map3 are identical, as are countdown1 and countdown3. All of the inconvenience is hidden in the stream primitives, where it belongs, so functions that use the primitives won't be burdened. This means that users can just step up and use the library without any knowledge of how the primitives are implemented, and indeed the implementation of the primitives can change without affecting users of the primitives, which would not have been possible with the streams of Figure 2. With this implementation of streams, (cutoff3 4 (map3 12div (countdown3 4))) evaluates to (3 4 6 12), as it should.

    This library provides streams that are even, not odd. This decision overturns years of experience in the Scheme world, but follows the traditions of the "pure" functional languages such as Miranda and Haskell. The primary benefit is elimination of the "off-by-one" error that odd streams suffer. Of course, it is possible to use even streams to represent odd streams, as Wadler et al show in their Figure 4, so nothing is lost by choosing even streams as the default.

    Obviously, stream elements are evaluated when they are accessed, not when they are created; that's the definition of lazy. Additionally, stream elements must be evaluated only once, and the result cached in the event it is needed again; that's common practice in all languages that support streams. Following the rule of R5RS section 1.1 fourth paragraph, an implementation of streams is permitted to delete a stream element from the cache and reclaim the storage it occupies if it can prove that the stream element cannot possibly matter to any future computation.

    The fact that objects are permitted, but not required, to be reclaimed has a significant impact on streams. Consider for instance the following example, due to Joe Marshall. Stream-filter is a function that takes a predicate and a stream and returns a new stream containing only those elements of the original stream that pass the predicate; it can be simply defined as follows:

        (stream-define (stream-filter pred? strm)
          (cond ((stream-null? strm) strm)
                ((pred? (stream-car strm))
                  (stream-cons (stream-car strm)
                               (stream-filter pred? (stream-cdr strm))))
                (else (stream-filter pred? (stream-cdr strm)))))
    

    But this implementation of stream-filter has a problem:

        (define (times3 n)
          (stream-car
            (stream-cdr
              (stream-cdr
                (stream-cdr
                  (stream-cdr
                    (stream-filter
                      (lambda (x) (zero? (modulo x n)))
                      from0)))))))
    

    Called as (times3 5), the function evaluates to 15, as desired. But called as (times3 1000000), it churns the disk, creating closures and caching each result as it counts slowly to 3,000,000; on most Scheme systems, this function will run out of memory long before it computes an answer. A space leak occurs when there is a gap between elements that pass the predicate, because the naive definition hangs on to the head of the gap. Unfortunately, this space leak can be very hard to fix, depending on the underlying Scheme implementation, and solutions that work in one Scheme implementation may not work in another. And, since R5RS itself doesn't specify any safe-for-space requirements, this SRFI can't make any specific requirements either. Thus, this SRFI encourages native implementations of the streams described in this SRFI to "do the right thing" with respect to space consumption, and implement streams that are as safe-for-space as the rest of the implementation. Of course, if the stream is bound in a scope outside the stream-filter expression, there is nothing to be done except cache the elements as they are filtered.

    Although stream-define has been discussed as the basic stream abstraction, in fact it is the (delay (force ...)) mechanism that is the basis for everything else. In the spirit of Scheme minimality, the specification below gives stream-delay as the syntax for converting an expression to a stream; stream-delay is similar to delay, but returns a stream instead of a promise. Given stream-delay, it is easy to create stream-lambda, which returns a stream-valued function, and then stream-define, which binds a stream-valued function to a name. However, stream-lambda and stream-define are both library procedures, not fundamental to the use of streams, and are thus excluded from this SRFI.

    Specification

    A stream-pair is a data structure consisting of two fields called the stream-car and stream-cdr. Stream-pairs are created by the procedure stream-cons, and the stream-car and stream-cdr fields are accessed by the procedures stream-car and stream-cdr. There also exists a special stream object called stream-null, which is a single stream object with no elements, distinguishable from all other stream objects and, indeed, from all other objects of any type. The stream-cdr of a stream-pair must be either another stream-pair or stream-null.

    Stream-null and stream-pair are used to represent streams. A stream can be defined recursively as either stream-null or a stream-pair whose stream-cdr is a stream. The objects in the stream-car fields of successive stream-pairs of a stream are the elements of the stream. For example, a two-element stream is a stream-pair whose stream-car is the first element and whose stream-cdr is a stream-pair whose stream-car is the second element and whose stream-cdr is stream-null. A chain of stream-pairs ending with stream-null is finite and has a length that is computed as the number of elements in the stream, which is the same as the number of stream-pairs in the stream. A chain of stream-pairs not ending with stream-null is infinite and has undefined length.

    The way in which a stream can be infinite is that no element of the stream is evaluated until it is accessed. Thus, any initial prefix of the stream can be enumerated in finite time and space, but still the stream remains infinite. Stream elements are evaluated only once; once evaluated, the value of a stream element is saved so that the element will not be re-evaluated if it is accessed a second time. Streams and stream elements are never mutated; all functions involving streams are purely applicative. Errors are not required to be signalled, as in R5RS section 1.3.2, although implementations are encouraged to detect and report errors.

    stream-null (constant)
    Stream-null is the distinguished nil stream, a single Scheme object distinguishable from all other objects. If the last stream-pair in a stream contains stream-null in its cdr field, the stream is finite and has a computable length. However, there is no need for streams to terminate.
        stream-null                                 => (stream)
    
    (stream-cons object stream) (syntax)
    Stream-cons is the primitive constructor of streams, returning a stream with the given object in its car field and the given stream in its cdr field. The stream returned by stream-cons must be different (in the sense of eqv?) from every other Scheme object. The object may be of any type, and there is no requirement that successive elements of a stream be of the same type, although it is common for them to be. It is an error if the second argument of stream-cons is not a stream.
        (stream-cons 'a stream-null)                => (stream 'a)
        (stream-cons 'a (stream 'b 'c 'd))          => (stream 'a 'b 'c 'd)
        (stream-cons "a" (stream 'b 'c))            => (stream "a" 'b 'c)
        (stream-cons 'a 3)                          => error
        (stream-cons (stream 'a 'b) (stream 'c))    => (stream (stream 'a 'b) 'c)
    
    (stream? object) (function)
    Stream? returns #t if the object is a stream, and otherwise returns #f. A stream object may be either the null stream or a stream pair created by stream-cons.
        (stream? stream-null)                       => #t
        (stream? (stream-cons 'a stream-null))      => #t
        (stream? 3)                                 => #f
    
    (stream-null? object) (function)
    Stream-null? returns #t if the object is the distinguished nil stream, and otherwise returns #f
        (stream-null? stream-null)                  => #t
        (stream-null? (stream-cons 'a stream-null)) => #f
        (stream-null? 3)                            => #f
    
    (stream-pair? object) (function)
    Stream-pair? returns #t if the object is a stream pair created by stream-cons, and otherwise returns #f.
        (stream-pair? stream-null)                  => #f
        (stream-pair? (stream-cons 'a stream-null)) => #t
        (stream-pair? 3)                            => #f
    
    (stream-car stream) (function)
    Stream-car returns the object in the stream-car field of a stream-pair. It is an error to attempt to evaluate the stream-car of stream-null.
        (stream-car (stream 'a 'b 'c))              => a
        (stream-car stream-null)                    => error
        (stream-car 3)                              => error
    
    (stream-cdr stream) (function)
    Stream-cdr returns the stream in the stream-cdr field of a stream-pair. It is an error to attempt to evaluate the stream-cdr of stream-null.
        (stream-cdr (stream 'a 'b 'c))              => (stream 'b 'c)
        (stream-cdr stream-null)                    => error
        (stream-cdr 3)                              => error
    
    
    (stream-delay expression) (syntax)
    Stream-delay is the essential mechanism for operating on streams, taking an expression and returning a delayed form of the expression that can be asked at some future point to evaluate the expression and return the resulting value. The action of stream-delay is analogous to the action of delay, but it is specific to the stream data type, returning a stream instead of a promise; no corresponding stream-force is required, because each of the stream functions performs the force implicitly.
        (define from0
          (let loop ((x 0))
            (stream-delay
              (stream-cons x (loop (+ x 1))))))
        from0                                       => (stream 0 1 2 3 4 5 6 ...)
    
    (stream object ...) (library function)
    Stream returns a newly allocated finite stream of its arguments, in order.
        (stream 'a (+ 3 4) 'c)                      => (stream 'a 7 'c)
        (stream)                                    => stream-null
    
    (stream-unfoldn generator seed n) (function)
    Stream-unfoldn returns n streams whose contents are produced by successive calls to generator, which takes the current seed as an arguments and returnsn + 1 values:

    (proc seed) -> seed result0 ... resultN

    where resultI indicates how to produce the next element of the Ith result stream:

    (value) value is the next car of this result stream
    #f no new information for this result stream
    () the end of this result stream has been reached
    Note that getting the next element in any particular result stream may require multiple calls to generator.
        (define (take5 s)
          (stream-unfoldn
            (lambda (x)
              (let ((n (car x)) (s (cdr x)))
                (if (zero? n)
                    (values 'dummy '())
                    (values
                      (cons (- n 1) (stream-cdr s))
                      (list (stream-car s))))))
            (cons 5 s)
            1))
        (take5 from0)                              => (stream 0 1 2 3 4)
    
    (stream-map function stream ...) (library function)
    Stream-map creates a newly allocated stream built by applying function elementwise to the elements of the streams. The function must take as many arguments as there are streams and return a single value (not multiple values). The stream returned by stream-map is finite if the given stream is finite, and infinite if the given stream is infinite. If more than one stream is given, stream-map terminates when any of them terminate, or is infinite if all the streams are infinite. The stream elements are evaluated in order.
        (stream-map (lambda (x) (+ x x)) from0)      => (stream 0 2 4 6 8 10 ...)
        (stream-map + (stream 1 2 3) (stream 4 5 6)) => (stream 5 7 9)
        (stream-map (lambda (x) (expt x x))
          (stream 1 2 3 4 5))                        => (stream 1 4 27 256 3125)
    
    (stream-for-each procedure stream ...) (library function)
    Stream-for-each applies procedure elementwise to the elements of the streams, calling the procedure for its side effects rather than for its values. The procedure must take as many arguments as there are streams. The value returned by stream-for-each is unspecified. The stream elements are visited in order.
        (stream-for-each display from0)             => no value, prints 01234 ...
    
    (stream-filter predicate? stream) (library function)
    Stream-filter applies predicate? to each element of stream and creates a newly allocated stream consisting of those elements of the given stream for which predicate? returns a non-#f value. Elements of the output stream are in the same order as they were in the input stream, and are tested by predicate? in order.
        (stream-filter odd? stream-null)            => stream-null
        (take5 (stream-filter odd? from0))          => (stream 1 3 5 7 9)
    

    Implementation

    A reference implementation of streams is shown below. It strongly prefers simplicity and clarity to efficiency, and though a reasonable attempt is made to be safe-for-space, no promises are made. The reference implementation relies on the mechanism for defining record types of SRFI-9, and the functions any and every fromSRFI-1. The stream-error function aborts by calling error as defined in SRFI 23.

    ;;; PROMISES A LA SRFI-45:
    
    ;;; A separate implementation is necessary to
    ;;; have promises that answer #t to stream?
    ;;; This requires lots of complicated type conversions.
    
    (define-record-type s:promise (make-s:promise kind content) s:promise?
      (kind    s:promise-kind    set-s:promise-kind!)
      (content s:promise-content set-s:promise-content!))
    
    (define-record-type box (make-box x) box?
      (x unbox set-box!))
    
    (define-syntax srfi-40:lazy
      (syntax-rules ()
        ((lazy exp)
         (make-box (make-s:promise 'lazy (lambda () exp))))))
    
    (define (srfi-40:eager x)
      (make-stream (make-box (make-s:promise 'eager x))))
    
    (define-syntax srfi-40:delay
      (syntax-rules ()
        ((srfi-40:delay exp) (srfi-40:lazy (srfi-40:eager exp)))))
    
    (define (srfi-40:force promise)
      (let ((content (unbox promise)))
        (case (s:promise-kind content)
          ((eager) (s:promise-content content))
          ((lazy)
           (let* ((promise* (stream-promise ((s:promise-content content))))
                  (content  (unbox promise)))
             (if (not (eqv? 'eager (s:promise-kind content)))
                 (begin
                   (set-s:promise-kind! content (s:promise-kind (unbox promise*)))
                   (set-s:promise-content! content (s:promise-content (unbox promise*)))
                   (set-box! promise* content)))
             (srfi-40:force promise))))))
    
    
    ;;; STREAM -- LIBRARY OF SYNTAX AND FUNCTIONS TO MANIPULATE STREAMS
    
    ;;; A stream is a new data type, disjoint from all other data types, that
    ;;; contains a promise that, when forced, is either nil (a single object
    ;;; distinguishable from all other objects) or consists of an object
    ;;; (the stream element) followed by a stream.  Each stream element is
    ;;; evaluated exactly once, when it is first retrieved (not when it is
    ;;; created); once evaluated its value is saved to be returned by
    ;;; subsequent retrievals without being evaluated again.
    
    ;; STREAM-TYPE -- type of streams
    ;; STREAM? object -- #t if object is a stream, #f otherwise
    (define-record-type stream-type
      (make-stream promise)
      stream?
      (promise stream-promise))
    
    ;;; UTILITY FUNCTIONS
    
    ;; STREAM-ERROR message -- print message then abort execution
    ;  replace this with a call to the native error handler
    ;  if stream-error returns, so will the stream library function that called it
    (define stream-error error)
    
    ;;; STREAM SYNTAX AND FUNCTIONS
    
    ;; STREAM-NULL -- the distinguished nil stream
    (define stream-null (make-stream (srfi-40:delay '())))
    
    ;; STREAM-CONS object stream -- primitive constructor of streams
    (define-syntax stream-cons
      (syntax-rules ()
        ((stream-cons obj strm)
         (make-stream
          (srfi-40:delay
           (if (not (stream? strm))
               (stream-error "attempt to stream-cons onto non-stream")
               (cons obj strm)))))))
    
    ;; STREAM-NULL? object -- #t if object is the null stream, #f otherwise
    (define (stream-null? obj)
      (and (stream? obj) (null? (srfi-40:force (stream-promise obj)))))
    
    ;; STREAM-PAIR? object -- #t if object is a non-null stream, #f otherwise
    (define (stream-pair? obj)
      (and (stream? obj) (not (null? (srfi-40:force (stream-promise obj))))))
    
    ;; STREAM-CAR stream -- first element of stream
    (define (stream-car strm)
      (cond ((not (stream? strm)) (stream-error "attempt to take stream-car of non-stream"))
            ((stream-null? strm)  (stream-error "attempt to take stream-car of null stream"))
            (else (car (srfi-40:force (stream-promise strm))))))
    
    ;; STREAM-CDR stream -- remaining elements of stream after first
    (define (stream-cdr strm)
      (cond ((not (stream? strm)) (stream-error "attempt to take stream-cdr of non-stream"))
            ((stream-null? strm)  (stream-error "attempt to take stream-cdr of null stream"))
            (else (cdr (srfi-40:force (stream-promise strm))))))
    
    ;; STREAM-DELAY object -- the essential stream mechanism
    (define-syntax stream-delay
      (syntax-rules ()
        ((stream-delay expr)
          (make-stream
            (srfi-40:lazy expr)))))
    
    ;; STREAM object ... -- new stream whose elements are object ...
    (define (stream . objs)
      (let loop ((objs objs))
        (stream-delay
          (if (null? objs)
              stream-null
              (stream-cons (car objs) (loop (cdr objs)))))))
    
    ;; STREAM-UNFOLDN generator seed n -- n+1 streams from (generator seed)
    (define (stream-unfoldn gen seed n)
      (define (unfold-result-stream gen seed)
        (let loop ((seed seed))
          (stream-delay
            (call-with-values
              (lambda () (gen seed))
              (lambda (next . results)
                (stream-cons results (loop next)))))))
      (define (result-stream->output-stream result-stream i)
        (stream-delay
          (let ((result (list-ref (stream-car result-stream) i)))
            (cond ((pair? result)
                    (stream-cons (car result)
                                 (result-stream->output-stream
                                   (stream-cdr result-stream) i)))
                  ((not result)
                    (result-stream->output-stream (stream-cdr result-stream) i))
                  ((null? result) stream-null)
                  (else (stream-error "can't happen"))))))
      (define (result-stream->output-streams result-stream n)
        (let loop ((i 0) (outputs '()))
          (if (= i n)
            (apply values (reverse outputs))
            (loop (+ i 1)
                  (cons (result-stream->output-stream result-stream i)
                        outputs)))))
      (result-stream->output-streams (unfold-result-stream gen seed) n))
    
    ;; STREAM-MAP func stream ... -- stream produced by applying func element-wise
    (define (stream-map func . strms)
      (cond ((not (procedure? func)) (stream-error "non-functional argument to stream-map"))
            ((null? strms) (stream-error "no stream arguments to stream-map"))
            ((not (every stream? strms)) (stream-error "non-stream argument to stream-map"))
            (else (let loop ((strms strms))
                    (stream-delay
                      (if (any stream-null? strms)
                          stream-null
                          (stream-cons (apply func (map stream-car strms))
                                       (loop (map stream-cdr strms)))))))))
    
    ;; STREAM-FOR-EACH proc stream ... -- apply proc element-wise for side-effects
    (define (stream-for-each proc . strms)
      (cond ((not (procedure? proc)) (stream-error "non-functional argument to stream-for-each"))
            ((null? strms) (stream-error "no stream arguments to stream-for-each"))
            ((not (every stream? strms)) (stream-error "non-stream argument to stream-for-each"))
            (else (let loop ((strms strms))
                    (if (not (any stream-null? strms))
                        (begin (apply proc (map stream-car strms))
                               (loop (map stream-cdr strms))))))))
    
    ;; STREAM-FILTER pred? stream -- new stream including only items passing pred?
    (define (stream-filter pred? strm)
      (cond ((not (procedure? pred?)) (stream-error "non-functional argument to stream-filter"))
            ((not (stream? strm)) (stream-error "attempt to apply stream-filter to non-stream"))
            (else (stream-unfoldn
                    (lambda (s)
    		  (cond
    		   ((stream-null? s)
    		    (values stream-null '()))
    		   ((pred? (stream-car s))
    		    (values (stream-cdr s) (list (stream-car s))))
    		   (else
    		    (values (stream-cdr s) #f))))
                    strm
                    1))))

    References

    •  Harold Abelson, Gerald Jay Sussman, Julie Sussman: Structure and Interpretation of Computer Programs, 1996, MIT Press.
    •  Lawrence C. Paulson: ML for the Working Programmer, 2nd edition, Cambridge University Press, 1996.
    •  George Springer and Daniel P. Friedman: Scheme and the Art of Programming, MIT Press and McGraw-Hill, 1989.
    •  Philip Wadler, Walid Taha, and David MacQueen: "How to add laziness to a strict language without even being odd", 1998 ACM SIGPLAN Workshop on ML, pp. 24-30. (available here in various formats)
    •  Patrick H. Winston, Berthold K. Horn: Lisp, 3rd edition, Addison Wesley, 1989.
  •  

    Loading mentions Retweet
    Filed under  //  cloud computing   new   pgm   stream computing   technology  
    Comments (0)
    Posted 4 months ago

    Stream Computing FAQ

    What is stream computing?
    Stream computing (or stream processing) refers to a class of compute problems, applications or tasks that can be broken down into parallel, identical operations and run simultaneously on a single processor device. These parallel data streams entering the processor device, computations taking place and the output from the device define stream computing.

    Today, stream computing is primarily the realm of the graphics processor unit (GPU) where the parallel processes used to produce graphics imagery are used instead to perform arithmetic calculations.

    Characteristics of stream computing:

    • Enable new applications on new architectures
    • Parallel problems other than graphics that map well on GPU architecture
    • Transition from fixed function to programmable pipelines
    • Various proof points in research and industry under the name GPGPU

    How does stream computing differ from computation on the CPU?
    Stream computing takes advantage of a SIMD methodology (single instruction, multiple data) whereas a CPU is a modified SISD methodology (single instruction, single data); modifications taking various parallelism techniques into account.

    The benefit of stream computing stems from the highly parallel architecture of the GPU whereby tens to hundreds of parallel operations are performed with each clock cycle whereas the CPU can at best work only a small handful of parallel operations per clock cycle.

    What are AMD's stream computing product features?
    AMD's FireStream™ 9170, our latest generation stream computing GPU, features:

    • 320 stream cores (compute units or ALUs)
    • 2GB on-board GDDR3 memory
    • Double precision floating point support
    • PCIe 2.0 x16 interface

    View AMD FireStream 9170 specifications.

    What are AMD's stream computing product advantages?
    AMD's FireStream 9170 hardware:

    • Only company positioned to offer a unique platform with strengths in accelerated GPU as well as CPU computing
    • Stream computing today leading to fusion tomorrow

    AMD's open systems SDK approach:

    • CTM initiative — Release low level specifications to enable developers and end users to understand the architecture and tuning to maximize performance
    • Deliver high level, multi-targeted compilers through Brook, 3rd parties like RapidMind, and partnerships with universities and industry.
    • Deliver library functions through AMD's ACML, APL, Cobra, and through university partner program.

    View AMD FireStream 9170 specifications.

    When can I get an AMD stream computing product and what does it cost?
    The FireStream 9170, AMD's flagship stream computing platform, is scheduled to be available in Q1 2008 in quantity. Please contact us for a price quote.

    software development kit containing compilers, libraries, performance profiling tools and drivers is available for download. SDK version 1.0 will be available in Q1 2008.

    This SDK is a compilation of open source software and proprietary AMD software put into the open source.

    Included in the first release are compilers, performance profilers, AMD's core math library (ACML) and AMD's compute abstraction layer (CAL) which enables device programming in familiar high-level languages rather than graphics programming specific to the GPU.

    Please read our stream computing whitepaper (PDF 1.1MB) for more information about this SDK.

    How does AMD's stream computing address the IEEE754 standard for double precision floating point computation?
    The IEEE754 standard defines formats for representing single and double-precision floating point numbers as well as some special cases like denorms, infinities and NaNs. It also defines four rounding modes and methods for exception handling.

    When we were preparing to launch our stream computing initiative in 2006, a series of customer interviews was conducted to get input on requirements relative to this standard. They learned that as long as we handled the special cases according to the most common usage, complete IEEE754 compliance wasn't required. AMD's FireStream 9170 implementation should handle a large majority of customers' requirements.

    In the AMD FireStream 9170:

    • Infinities and NaNs are handled as defined by the IEEE754 standard.
    • Rounding is handled using the "round to nearest" mode, which is the mode generally used in most applications.
    • Denormal numbers are flushed to zero. This is a common optimization in implementations where full-speed hardware support is not available, and is adequate for most applications.

    What does AMD's software stack look like?
    AMD has authored a whitepaper (PDF) that discusses our software stack.

    How does the AMD FireStream support Linux?
    AMD is committed to Linux and sees Linux as a major platform for Stream Computing. Recently we have announced an initiative on open source driver for Linux. We are continuing our momentum and we expect that stream computing stack will support Linux over the next calendar year.

    What type of programming model does AMD use for AMD FireStream?
    AMD encourages a pure streaming/SIMD model for AMD FireStream. A few enhancements like data sharing are useful for a small subset of applications. However data sharing in a SIMD environment brings its own challenges and should to be used with utmost care. In fact if used incorrectly performance might actually degrade.

    Regarding specific compiler implementation choices — currently we have enabled Brook with a CAL backend. We are looking at other options as well, including industry standards.

    What happened to AMD's CTM?
    CAL is a natural evolution to CTM — we are building our software stack bottoms up. We provide low-level access and specs as CTM extension to CAL. CAL permits the end user to write portable code. Frequently our developers like to drop down to the next level of detail for further tuning and profiling. CAL provides this level of access.

    Will the AMD FireStream SDK work on previous generation hardware?
    To run the CAL/Brook+ SDK, you need a platform based on the AMD R600 GPU or later. R600 and newer GPUs are found with ATI Radeon™ HD2400, HD2600, HD2900 and HD3800 graphics board.

    Which applications are best suited to Stream Computing?
    Applications best suited to stream computing possess two fundamental characteristics:

    1. A high degree of arithmetic computation per system memory fetch
    2. Computational independence — arithmetic occurs on each processing unit without needing to be checked or verified by or with arithmetic occurring on any other processing unit.

    Examples include:

    • Engineering — fluid dynamics
    • Mathematics — linear equations, matrix calculations
    • Simulations — Monte Carlo, molecular modeling, etc.
    • Financial — options pricing
    • Biological — protein structure calculations
    • Imaging — medical image processing

    If Stream processors are really GPUs, will I need to learn graphics programming to properly implement my application?
    No. AMD along with the open source community are working to mask the GPU's graphics programming heritage. This is being accomplished by our release of Brook+, the open source Brook compiler plus AMD enhancements geared directly at non-graphics stream computing, and AMD's CAL — Compute Abstraction Layer. CAL provides high-level language access to the various parts of the GPU as needed.

    Developers are thus able to write directly to the GPU without needing to learn graphics-specific programming languages. CAL provides direct communication to the device.

    Will future stream computing architectures force me to rewrite my applications?
    Implementing a new algorithm or application in a stream computing environment will require the use of various stream-specific techniques. These techniques and tools are all available in the AMD FireStream SDK described above.

    Existing applications that currently use only the CPU for computation will require recompiling to take advantage of the capabilities of the stream processor.

    We anticipate most applications running in a stream computing environment in the near term will be applications written from scratch with the intent of implementing a stream computing platform.

    Over time as applications undergo typical rewrites and recompiles, those applications naturally suited to the stream computing environment will migrate to this environment along with the necessary recoding and recompiling tasks.

    Is stream computing a return to the old co-processor days?
    In many ways stream computing does resemble the days when vector co-processors handled substantial mathematical tasks. The benefit then as now is the remarkable performance boost gained through implementing these specialized components.

    We fully anticipate technological advancements as well as programming techniques to pull these co-processors closer to the CPU over time until, as with earlier co-processors, they disappear into the CPU itself.

    AMD's competitors offer similar but non-standardized products. Should I wait on product standardization before exploring Stream Computing?
    AMD is focused on providing the tools necessary to help our customers succeed with our AMD FireStream products, and we believe the open systems approach is a critical component of this philosophy. Open systems enables AMD along with partners and 3rd party vendors to collaborate closely when developing highly integrated solutions as well as work independently when targeting a niche solution.

    AMD's open systems philosophy includes:

    • Open IL and ISA specifications to ensure developers can optimize system performance
    • Support for AMD Brook+ along with other 3rd party high-level tools to provide a choice of familiar development environments
    • Open source Linux drivers and AMD-enhanced Brook+ enabling developers to modify and retarget tools as needed
    • AMD partnership opportunities with system vendors and integrators to deliver customer-focused solutions

    Who can I contact at AMD for more information?
    Contact us with general questions about AMD's stream computing initiative, products, sales and training.

    Contact us with technical questions about FireStream hardware, software or developer issues.

    AMD Compute Abstraction Layer (CAL) FAQ

    How do I get started with AMD CAL SDK?
    We have assembled a number of documents to help guide you through the setup and early use of CAL. Please read these before getting started:

    We have also authored a programmer's guide which is included in the SDK download.

    Note that the three CAL files are also included in the SDK. We have posted them here as well for your reference prior to installing CAL.

    Why does the integrated installer (setup.exe) behave badly?
    The integrated installer is designed to invoke the CAL and BROOK+ installers in that order. Most of the installation logic is present in each individual MSI.

    If you have problems with this installer, please remove any previous versions of CAL/BROOK+ using Add/Remove programs in Control Panel and try again.

    Does the Repair/Modify option update my previous installation of CAL/BROOK+?
    No, you would need to completely remove the previous version of CAL/BROOK+. The Repair/Modify option only repairs the current version of CAL/BROOK+.

    What is CALROOT and where is it set and used?
    CALROOT is set as the path to the CAL SDK during installation. It is defined in the current user's Environment Variables. Other users on the system have to define CALROOT in their environment or in the system environment.

    The Visual Studio project files for the samples use CALROOT to locate the CAL headers and libraries. Some sample projects also use CALROOT to load themselves. You would not need CALROOT unless you wish to build the samples.

    Brook+ FAQ

    How do I get started with Brook+?
    Brook+ requires the following be installed or available to work with AMD FireStream technology:

    • Visual Studio (for Windows developers) or GCC (for Linux developers) installed and all environment variables correctly set up
    • Cygwin (for Windows) — must appear later in the PATH variable than the Visual Studio tools
    • CAL SDK installed from the same source as you obtained the source tree from
    • CALROOT environment variable set

    To build the compiler and runtime, enter the "platform" directory and type "make". This generates and fills the "sdk" directory.

    To build the samples, first build the sdk as above, then enter the "samples" directory and type "make".

    Where can I get CAL support?
    Information was shipped in the installer.

    What graphics driver does Brook+ require?
    Brook+ has no direct dependency on the graphics driver. CAL working correctly with a given driver should result in Brook+ working correctly.

    How is BROOKROOT used?
    The makefiles need CALROOT defining. BROOKROOT is not used by the makefile system.

     

    Loading mentions Retweet
    Filed under  //  HIVE   IBM   new   pgm   stream computing   technology  
    Comments (0)
    Posted 4 months ago

    Two Worlds of Data – Unstructured and Structured

                                        Google, one of the premier free-form search engines on the planet, may be getting a little skittish about the Microsoft Longhorn project. Google wants to unleash its search technology on the enterprise and on the desktop, but with Longhorn, Microsoft plans to have that capability built into its operating system and its new file system. Is free-form searching really the next battleground? Or is bringing together two needles in a haystack of information really the holy grail of search technology?

    In the new category of enterprise software called business performance management (BPM), bringing together the worlds of structured and unstructured data can add significant value to the enterprise. BPM fosters new levels of corporate accountability, financial rigor and tangible value creation across the distributed global organization. It is driven by the imperative to align internal and external constituencies with business objectives through real-time availability and continuous exchange of financial, transactional and operational information. Effectively implemented, BPM enables enterprises to better shape and influence business outcomes by improving the caliber and speed of decision making. With it, executives can anticipate and respond to shifting market dynamics, intelligently allocate and utilize critical resources and consistently meet management and shareholder expectations.

    Data in BPM

    People use unstructured data every day. Although they may not be aware, they use it for creating, storing and retrieving reports, e-mails, spreadsheets and other types of documents. Unstructured data consists of any data stored in an unstructured format at an atomic level. That is, in the unstructured content, there is no conceptual definition and no data type definition - in textual documents, a word is simply a word. Some current technologies used for content searches on unstructured data require tagging entities such as names or applying keywords and meta tags. Therefore, human intervention is required to help make the unstructured data machine readable.

    People also use structured data every day. Structured data is anything that has an enforced composition to the atomic data types. Structured data is managed by technology that allows for querying and reporting against predetermined data types and understood relationships.

    Two Categories of Unstructured Data

    Unstructured data consists of two basic categories:

    • Bitmap Objects: Inherently non-language based, such as image, video or audio files.
    • Textual Objects: Based on a written or printed language, such as Microsoft Word documents, e-mails or Microsoft Excel spreadsheets.

    Both of these object types may be classified as data, but the technology and methodology for harnessing relevant information from bitmap objects is still in its infancy. Most of today's technology addresses textual objects. Enterprise content management (ECM) technologies, for example, can help contain unstructured data. Textual data mining and analysis vendors provide analysis tools for unstructured textual objects, and business intelligence vendors supply solutions for querying and analyzing structured data. However, bringing them together - querying both the unstructured and structured worlds - and then associating these two worlds at relevant points is where the most value is gained and also where the highest level of challenge is presented.

    Comparing these categories with structured data raises three distinct challenges:

    1. Even if unstructured data is in a format such as a Microsoft Word template, the data is still not consumable from a semantic level without a compatible interface or application.
    2. Even with a compatible technology, we cannot necessarily gain insight into the context of the information unless we can actually read it.
    3. And lastly, the way we interpret what we read is largely subjective.

    "A Picture is Worth..."

    One of the challenges when dealing with unstructured data is the written word and the fact that it often does not communicate the exact meaning intended. There is a stark division between the written word and the spoken word. The phonetically written word sacrifices worlds of meaning and perception that were once secured in hieroglyphics and still are in the Chinese ideogram. Alphabets such as this provide gestalt - an understanding of the whole within the picture. The Western alphabet lacks the ability to distinguish context and concept from the symbols.

    A recent Wall Street Journal article provides a good example of why it is not necessarily appropriate to assign a qualitative value to unstructured data. The Wall Street Journal performed an analysis of a collection of high schools, both public and private, and calculated the percentage of graduating seniors who were accepted to Ivy League schools. At the top of the list was a private school in Brooklyn, New York, called Saint Ann's. Saint Ann's came in first with a whopping 41 percent of graduates gaining admission to 10 of the nation's most exclusive schools, such as Yale, Harvard, Brown, Duke and Cornell. Saint Ann's even beat the Hopkins School in New Haven, Connecticut, where 51 percent of the students in the senior class this year were National Merit Scholars.1

    The interesting point about Saint Ann's and its success rate comes from the way teachers assign grades to their students. They don't. While at the school, students get written reports about their achievements and areas that need improvement, but there is never a quantitative number or letter assigned to their work. Upon graduation, students receive personal essays about their work from the school's headmaster. Therefore, Saint Ann's college applicants cannot be placed in a "GPA bucket" with other applicants. Each application from Saint Ann's must be read to acquire a full picture of the student. Students from other high schools may well be disqualified at the gate because of a poor GPA - a number assigned to a student meant to represent the quality of knowledge or learning the student has achieved. Perhaps there is some unstructured information that is not meant to have a number value assigned to it as a predicate of value.

    Two Approaches to the Problem

    There are two ways to approach the question of using this unstructured data. One is from the semantic construction viewpoint. Here, finding word variants and similar words as in search engine usage is inadequate. To differentiate between the word "balance" as a verb or a noun is a feat of semantics and linguistics categorization. This approach requires taxonomies, ontologies and a semantic layer to build concept and category relationships. The second way, which is achievable, is to bring the unstructured parts of the enterprise into the structured world. If we have already identified the context and semantics of our unstructured data, we can bring this information together with our structured data, bridging the two worlds and ultimately providing greater business insight.

    Sarbanes-Oxley

    A good example of bridging unstructured data to BPM concerns what is happening around the Sarbanes-Oxley Act. Companies are implementing solutions that must bring together the unstructured and structured worlds. Consider, for example, an analyst working on consolidating the Property, Plant & Equipment (PP&E) account for a company. The analyst will notice if the account appears out of tune with the rest of the schedule as it will be highlighted in red. After noting the error, the analyst should be able to bring up a context menu for that account and gain access to a list of relevant items. The error may be a link to the GAAP definition of PP&E, the internal accounts that roll up to that master account, an audit trail of all the ledger entries or even information about the control environment. This is one way to help the enterprise gain access to associated and relevant information that exists - both internally and externally. This further illustrates the strength of structured financial information brought together with unstructured content.

    Extensible business reporting language (XBRL) is an attempt to standardize structured and unstructured data. However, it is still difficult to relate the financial schedules with footnotes and the Notes to Consolidated Financial Statements, except to physically read them. XBRL helps solve this problem by tagging discrete elements of unstructured regulatory filings. For example, an analyst could pull all of the contingent liability sections across the current quarter's filings for an industry and perform a side-by-side comparison. Unfortunately, the adoption of XBRL is limited. Until regulators such as the SEC accept and provide analysis mechanisms for XBRL, that type of analysis responsibility will remain a function of human searching and reading.

    Mergers and Acquisitions

    Another example is mergers and acquisitions (M&A) analysis conducted through a structured tool. Running numbers and creating models and forecasts is useful, but it isn't enough. The greatest value is gained from related stories and the relationship of those stories to the M&A target as part of the analysis. It is critical to combine the structured number-crunching with the important but less tactile content from unstructured sources. While value can be gained from mining the unstructured world itself, the greatest value is obtained by marrying the unstructured and structured worlds. It is this combination that drives the most significant value to BPM.

    Sales Force Automation

    One other example of bringing unstructured data to the structured world in the context of BPM is in sales force automation. A sales manager may have a dashboard that tracks his sales pipeline. All of the sales representatives reporting to the manager may roll up to one aggregate forecast. One component or metric in this forecast may be the number of calls made to a prospective client. The dashboard is configured to show a green light for eight or more calls to a client, yellow for four to eight calls and red for less than four. These may be good metrics, but the key is what lies beneath the numbers and thresholds.

    If past performance has shown that more calls to a client will result in a sale, then it makes sense to move forward with that metric. But what if it doesn't? What if three calls from a certain sales rep is a good sign? What if the fewer calls that rep makes, the better the chance of a sale? Now, the sales manager can read all of the call detail and get the needed information. What is really needed is an automated analysis of that unstructured information that can produce a symbol, color or some type of visualization to uncover and communicate its meaning. For instance, if the unstructured information contains objections from a client, that information will only become evident when reading the report. Wouldn't it be helpful to have the completed analysis and a link pointing to documents or other data that can answer the objections automatically?

    To gain true insight from the data, it is necessary to consider not the number of calls, but the calls themselves. Necessary data includes the details of each call, the tone of the call, the length of the call and the participants on the call. Was it a voicemail, a secretary or an "on a conference call, can you call me back" call? Or, more important, was it a follow-up call detailing a proof of concept in progress that lasted for more than an hour? Obviously, this last type of call contains more value than the three prior calls. However, obtaining the deeper insight means referencing the underlying meaning of the metric.

    Solutions and Paths to Success

    Enterprise content management systems are now gaining wider adoption, and this provides access to unstructured data and the meta data on top of it. However, it is not possible to look upon that data and grasp it as a whole. We must read the text to gain understanding. Additionally, what we gain from reading is still through our own lens - the picture each of us conjures may be different.

    This is precisely why intelligent systems with semantic layers and taxonomies can connect to the unstructured world. As we define what the documents are and what relationships exist between the unstructured and structured worlds of data, we can bring a unified view, a clear definition and greater gestalt or understanding of business drivers - the heart of BPM.

    To put it more concretely, one of the tenets of BPM is the ability to gain insight through measuring results and managing performance. In the May issue of DM Review, Dan Sullivan of the Ballston Group commented on some approaches to textual data mining. One point he made is that the choice of a tool for textual mining should include some type of clustering algorithms and visualization tools, such as thematic maps. The ability to drive the correct visualizations from the marriage of the unstructured world and the structured world is crucial. This is altogether the correct approach for starting the textual data mining process on top of enterprise content management systems. The next step is relating those thematic maps to the atomic or the aggregate structured data. The result? We are given the ability to improve our overall business performance by leveraging insight that will drive better business decisions such as which product to build to improve profitability, which customers to target based on insight or modeling a proposed acquisition.

    Reference:
    1. Bernstein, Elizabeth. "The Price of Admission." Wall Street Journal. April 2, 2004. Page W1.

     

    Geoffrey Weglarz has worked in the information industries for 15 years. He has a background in software development, relational database technologies, multidimensional database technologies and linguistics. He can be reached at geoffrey_weglarz@hyperion.com.

    Loading mentions Retweet
    Filed under  //  data   stream computing  
    Comments (0)
    Posted 4 months ago

    Structured Data vs UnStructured Data

    "The labels "structured data" and "unstructured data" are often used ambiguously by different interest groups; and often used lazily to cover multiple distinct aspects of the issue. In reality, there are at least three orthogonal aspects to structure:

    • The structure of the data itself. 
    • The structure of the container that hosts the data. 
    • The structure of the access method used to access the data.

    These three dimensions are largely independent and one does not need to imply another. For example, it is absolutely feasible and reasonable to store unstructured data in a structured database container and access it by unstructured search mechanisms."

    I have never been asked by a customer to clarify what I mean by unstructured data but I know it is coming.

    So when we say 80% of your data is unstructured, do we mean "Not stored in database"?  Is XML tagged data, structured ? (yes), if it stored on the file system( )?  A .pdf stored in a database and indexed via a search engine? 

    One participant in the Oracle conversation has this take:

    As per my experience, 'unstructured data' is data/information/content which doesn't have a specific  structure/rule attached to it. For example, a word document or an HTML page can contain data/information/content in any structure. One can have any number of images, paragraph etc. Also, in most of the cases, there is no relation between the content(s). On the other hand, 'structured data' has structure/rules attached to it e.g. a product. A product will always have a code, manufacturer, category etc. and thus defines the structure of data. 

    Now, the above is business terms. So, you can store them the way you wish to have your technical solution- it could be Database, File System etc.

    So this would basically be saying that it is the structure of the data itself that determines whether or not it isstructured or unstructured. 

    However, within the ECM space, I tend to take a different tack, at least when explaining it to myself.  I typically take a more simplistic approach.  Structured vs Unstructured is cellular data vs non-cellular data.  DB LOB types are special exception cases. 

    <disclaimer>Of course, I take this approach when presenting ECM which deals primarily with content sored in non-DB table cell formats/locations.</disclaimer> 

    While XML data may be structured, it is contained in a content item (XML Document) that is itself unstructured.  Were the xml data to be parsed and inserted into a table structure that mirrored the XML tag names (for example) at that point the data in the DB would be considered "structured" while the XML Document and all the data it contained would still be considered "unstructured". 

    Unstructured data

     

    Loading mentions Retweet
    Filed under  //  cloud computing   computer   new   stream computing   technology  
    Comments (0)
    Posted 4 months ago

    SPADE Language Specification

    This is the langauge to Stream Computing

    This document describes the language design of the Spade language, version 2. Spade is the programming language for InfoSphere Streams, IBM’s high-performance distributed stream processing system . This document only focuses on the syntax and semantics of user-visible features; a description of the implementation design or the inner workings of the compiler is out of scope for this document. Spade 2 is not backward compatible to Spade 1, and instead takes the opportunity to clean up several features.

    This document is sprinkled with paragraphs containing auxiliary information:
    • Practical advice: Best practices and conventions for users.
    • Implementation note: Note about how the compiler or runtime implements a feature.
    • For Spade 1 users: Comparisons between old and new language features.
    • For language experts: Terminology from the programming language community.
    • Language design rationale: Justification for decisions where we had to reconcile conflicting design goals.

    In addition, there are numerous code examples. IBM’s Spade compiler is continuously tested on these examples. While some examples are semantically incomplete (e.g., using undefined identifiers), all examples are syntactically valid.

    PDF:rc24760.pdf

    Loading mentions Retweet
    Filed under  //  IBM   langauge   stream computing  
    Comment (1)
    Posted 4 months ago

    IBM InfoSphere Streams, shortly "STREAM COMPUTING"


    IBM InfoSphere Streams enables continuous and extremely fast analysis of massive volumes of information-in-motion to help improve business insights and decision making.

    A high-performance computing system that rapidly analyzes information as it streams from thousands of real-time sources, increasing the speed and accuracy of decision making in diverse fields such as healthcare, astronomy, manufacturing and financial trading etc.


    FEATURES AND BENEFITS

    Secure, privacy-compliant, and auditable execution environment.

    FOR WHITEPAPERS


    Why Google and IBM Are Ahead of the Competition:

    AN ARTICLE SUBMITTED BY TIME.Inc 

    Steve Mills, SVP of IBM Software (left), and Dr. John Kelly, SVP of IBM Research, view Stream Computing technology.


    A huge population of red ants has bedeviled Texas farmers for years. By some estimates the insects cost state businesses close to $1 billion a year due to crop and machinery destruction. Killing the ants and their nests has not proven easy.

    Texas A&M researchers have discovered that the phorid fly from South America will lay eggs on the fire ants and the maggots which are hatched eat away at the ant's brains, eventually causing their heads to fall off. Someone at the university was willing to underwrite the work to solve a problem. That investment was almost certainly much less than the $1 billion a year that fire ants cost businesses in the state. (See pictures of bug cuisine.)

    A recession does not stop advancements in technology. It just makes companies so frightened of risk that they choose not to make the investment in the fire ant projects.

    In the last week, the two most successful technology companies in the world, IBM (IBM) and Google (GOOG) have announced major new products. These are developments that will probably help the firms take business away from their competitors. The scope of the products' applications is broad enough that the R&D investment to create them must have been extensive.

    IBM released "stream computing" applications that allow businesses to look at and analyze huge amounts of data in real time. Describing the product, IBM said "System S is built for perpetual analytics — utilizing a new streaming architecture and breakthrough mathematical algorithms, to create a forward-looking analysis of data from any source — narrowing down precisely what people are looking for and continuously refining the answer as additional data is made available." The ability to have access to that kind of information will undoubtedly be valuable to governments, the financial industry, and large multinationals with thousands of retail outlets. The new software is unique and does not appear to have any direct competition.

    Google also announced a new set of products. The most important one allows the company's customers to take very large amounts of search data and organize it into spreadsheets. As it released the new tools and several other innovations, Google said they would "open up whole new ways of searching that haven't previously been available." Yahoo! (YHOO) does not have anything to compete with the new technology. Microsoft (MSFT) does not either, despite its unparalleled access to capital and software engineering talent. 

    The shares of Google and IBM have handily outperformed those of all the other large tech companies based in the U.S. such as Hewlett Packard (HPQ), Microsoft (MSFT), Cisco (CSCO), and Oracle (ORCL). Each of the companies is blessed with substantial earnings and technology staffs in the tens of thousands. But, the firms are not all viewed the same, at least by investors who trade tens of millions of their shares each day.

    In most ways, IBM and Google are not like one another at all. IBM makes its money selling expensive hardware, client services, and software to companies, most of which are very large, and to governments. Google has millions of customers who pay nothing to use its services. It has millions of advertisers who spend money to reach people who look at search results and most of these marketers are very small. 

    What the companies do have in common is a willingness to take risks, probably risks with long odds in order to launch new products. These products may be failures, but they are well enough researched and designed that they have a good chance of keeping IBM and Google ahead of the competition even if that does not immediately involve significant new revenue.

    The fire ant problem never goes away. Unsolved problems in every industry cost companies money. Sometimes companies do not even know that their problems can be solved. The phorid fly is an obscure species. So is software that can analyze huge amounts of data in real time.

    DATA are collected from official website of IBM and Time.Inc

    PICTURES:

    Static data versus streaming data: conceptual overview.

     stream computing can be used to analyze market data faster than ever before. The result is a machine that helps automated trading systems determine the price of securities using financial events that have just occurred. To build the system, the computing company partnered with TD Securities, an investment-banking firm, to tweak IBM software called InfoSphere Streams for financial data. The firm ran the software on one of the latest IBM supercomputers, known as Blue Gene/P. stream computing can be used to analyze market data faster than ever before. The result is a machine that helps automated trading systems determine the price of securities using financial events that have just occurred. To build the system, the computing company partnered with TD Securities, an investment-banking firm, to tweak IBM software called InfoSphere Streams for financial data. The firm ran the software on one of the latest IBM supercomputers, known as Blue Gene/P.

    Loading mentions Retweet
    Filed under  //  cloud computing   computer   IBM   stream computing   technology  
    Comments (2)
    Posted 5 months ago