C open source do sistema de negociação

C ++ trading system open source
Obter através da App Store Leia esta publicação em nosso aplicativo!
Negociação de bibliotecas C ++.
Existem bibliotecas c ++ gratuitas que teriam algumas das funções que seriam usadas no desenvolvimento de uma estratégia comercial. Por exemplo, cálculo de retirada, Previsão de Volatilidade, MAE, MFE. etc.
Eu sei que eu poderia codificar isso, mas isso me ajudaria a economizar algum tempo e me concentrar na estratégia e não nas gerações do relatório.
Aqui estão algumas sugestões.
Pesquise a Amazon (ou o seu livreiro favorito) para livros sobre "financiamento quantitativo C ++". Encontrei vários títulos que parecem promissores.
Fui para a SourceForge (pesquisando em "Sistemas de negociação") e vi vários sistemas promissores que poderiam dar uma vantagem em retirada, MAE, etc.
Uso a TradeStation 9.0 para comparar várias estratégias de negociação. Ele fornecerá gráficos MAE / MFE, curvas de equidade comercial e estratégias de classificação com base na redução máxima. Mas não se esqueça de ler os Sistemas de Negociação que Trabalham: Construindo e Avaliando Sistemas de Negociação Efetivos por Thomas Stridsman para uma crítica adequada dos relatórios gerados pela TradeStation.
Para realmente criar sua estratégia de negociação, você pode usar a fonte aberta TA Lib (que está escrita em c ++) e que está disponível a partir daqui. Para testá-lo, você pode usar R e o pacote PerformanceAnalytics.

C ++ trading system open source
Obter através da App Store Leia esta publicação em nosso aplicativo!
Sistemas de negociação de baixa latência usando o C ++ no Windows?
Parece que todos os principais bancos de investimento usam o C ++ no Unix (Linux, Solaris) para suas aplicações de servidor de baixa latência / alta freqüência. Por que o Windows geralmente não é usado como uma plataforma para isso? Existem razões técnicas pelas quais o Windows não pode competir?
Os requisitos de desempenho nos sistemas de latência extremamente baixa utilizados para negociação algorítmica são extremos. Neste ambiente, os microseconds contam.
Não tenho certeza sobre o Solaris, mas o caso do Linux, esses caras estão escrevendo e usando patches e personalizações de baixa latência para o kernel inteiro, dos drivers de placa de rede instalados. Não é que haja uma razão técnica pela qual isso não poderia ser feito no Windows, mas há um procedimento prático / legal - acesso ao código-fonte e a capacidade de recompilá-lo com as mudanças.
Tecnicamente, não. No entanto, há um motivo de negócios muito simples: o resto do mundo financeiro é executado no Unix. Os bancos são executados no AIX, o mercado de ações em si é executado no Unix e, portanto, é simplesmente mais fácil encontrar programadores no mundo financeiro que são usados para um ambiente Unix, em vez de um Windows.
(Eu trabalhei em bancos de investimento por 8 anos) Na verdade, bastante do que os bancos chamam baixa latência é feito em Java. E nem mesmo o Java em tempo real - apenas Java normal com o GC desligado. O principal truque aqui é ter certeza de ter exercido todo o seu código o suficiente para que o jit tenha executado antes de trocar uma VM específica em prod (então você tem algum loop de inicialização que é executado por alguns minutos - e failover a quente) .
Os motivos para usar o Linux são:
A administração remota ainda é melhor, e também um baixo impacto - terá um efeito mínimo sobre os outros processos na máquina. Lembre-se, esses sistemas são freqüentemente localizados na troca, então os links para as máquinas (de você / sua equipe de suporte) provavelmente serão piores do que aqueles para seus datacentres normais.
Tunability - a capacidade de configurar swappiness para 0, obter a JVM para pré-alocar grandes páginas, e outros truques de baixo nível são bastante úteis.
Tenho certeza de que você poderia conseguir o Windows funcionar de forma aceitável, mas não há nenhuma grande vantagem em fazê-lo - como outros disseram, qualquer empregado que você poached teria que redescobrir todos os seus truques de latência em vez de simplesmente executar uma lista de verificação.
Razão é simples, há 10-20 anos, quando esses sistemas surgiram, os servidores "multidimensionais" hardcore eram SOMENTE em algum tipo de UNIX. O Windows NT estava no kinder-garden estes dias. Então, o motivo é "histórico".
Os sistemas modernos podem ser desenvolvidos no Windows, é apenas uma questão de gosto nos dias de hoje.
PS: Estou trabalhando em um desses sistemas :-)
Linux / UNIX são muito mais utilizáveis para usuários remotos simultâneos, tornando mais fácil o script em torno dos sistemas, use ferramentas padrão como grep / sed / awk / perl / ruby / less em logs. ssh / scp. Todas essas coisas estão lá.
Há também problemas técnicos, por exemplo: para medir o tempo decorrido no Windows, você pode escolher entre um conjunto de funções com base no controle do relógio do Windows e no QueryPerformanceCounter (). O primeiro é incrementos cada 10 a 16 milissegundos (nota: alguma documentação implica mais precisão - por exemplo, os valores da medida GetSystemTimeAsFileTime () para 100ns, mas eles relatam o mesmo limite de 100ns do relógio para que ele seja novamente marcado). O último - QueryPerformanceCounter () - tem problemas de parada onde diferentes núcleos / cpus podem relatar relógios - desde a inicialização que diferem em vários segundos devido a serem aquecidos em diferentes momentos durante a inicialização do sistema. O MSDN documenta isso como um possível erro do BIOS, mas é comum. Então, quem quer desenvolver sistemas de negociação de baixa latência em uma plataforma que não pode ser instrumentada adequadamente? (Existem soluções, mas você não encontrará nenhum software que esteja sentado convenientemente em boost ou ACE).
Muitas variantes Linux / UNIX têm muitos parâmetros facilmente ajustáveis para compensar latência para um único evento contra a latência média sob carga, tamanhos de fatia de tempo, políticas de agendamento, etc. Em sistemas operacionais de código aberto, também há a garantia de que vem com a capacidade de consulte o código quando você acha que algo deve ser mais rápido do que é, e o conhecimento de que uma comunidade (potencialmente enorme) de pessoas tem sido e está fazendo isso de forma crítica - com o Windows, obviamente, é principalmente as pessoas que são designadas para olhe para ele.
No lado do FUD / reputação - um pouco intangível, mas uma parte importante dos motivos da seleção do sistema operacional - eu acho que a maioria dos programadores da indústria apenas confiaria no Linux / UNIX mais para fornecer um agendamento e um comportamento confiáveis. Além disso, o Linux / UNIX tem uma reputação de falhar menos, embora o Windows seja bastante confiável nos dias de hoje, e o Linux possui uma base de código muito mais volátil do que o Solaris ou o FreeBSD.
Há uma variedade de razões, mas a razão não é apenas histórica. Na verdade, parece que cada vez mais aplicativos do lado do servidor funcionam no * nix hoje em dia do que nunca (incluindo grandes nomes como London Stock Exchange, que mudou de uma plataforma). Para aplicativos do lado do cliente ou da área de trabalho, seria bobo segmentar qualquer coisa que não o Windows, pois essa é a plataforma estabelecida. No entanto, para aplicativos do lado do servidor, a maioria dos lugares em que trabalhei para implementar no * nix.
Eu concordo parcialmente com a maioria das respostas acima. Embora o que eu percebi é o maior motivo para usar o C ++, é porque é relativamente mais rápido com uma vasta biblioteca STL.
Além disso, o sistema linux / unix também é usado para aumentar o desempenho. Eu conheço muitas equipes de baixa latência que vão até uma extensão de ajustar o kernel do linux. Obviamente, esse nível de liberdade não é fornecido pelo Windows.
Outras razões, como os sistemas legados, o custo da licença, os recursos também contam, mas são fatores de condução menores. Como "rjw" mencionado, eu vi equipes usar Java também com uma JVM modificada.
Eu secundo as opiniões de histórico e o acesso à manipulação de kernel.
Além desses motivos, eu também acredito que, assim como como eles eliminam a coleta de lixo e o mecanismo similar em Java ao usar essas tecnologias em baixa latência. Eles podem evitar o Windows por causa da API no nível alto, que interagem com ossos de baixo nível e depois o kernel.
Portanto, o núcleo é, naturalmente, o kernel que pode ser interagido com o uso do baixo nível os. As APIs de alto nível são fornecidas apenas para facilitar a vida dos usuários comuns. Mas, no caso de baixa latência, esta é uma camada gordurosa e uma perda de fração de segundos em torno de cada operação. Então, uma opção lucrativa para ganhar poucos segundos por chamada.
Além disso, essa outra coisa a considerar é a integração. A maioria dos servidores, centros de dados e intercâmbios usam UNIX e não o Windows, portanto, usar os clientes da mesma família facilita a integração e a comunicação.
Então você tem problemas de segurança (muitas pessoas lá fora, talvez não concordem com este ponto, porém) hackear o UNIX não é fácil comparado ao hacking WINDOWS. Eu não concordo que o licenciamento deve ser o problema para os bancos porque eles duchem dinheiro em cada peça de hardware e software e as pessoas que os personalizam, então as licenças de compra não serão tão maiores quanto a questão quando consideradas o que ganham comprando.

Open Source Trading Platform em C ++
Open Source Trading Platform em C ++
Esta é uma discussão sobre a plataforma Open Source Trading em C ++ nos fóruns de Trading Systems, parte da categoria Methods; Alguém conhece alguma plataforma de negociação opensource com o lado do servidor escrito em C ++ / UNIX. Procurando por plataforma de negociação em desenvolvimento para estratégia.
Procurando por uma plataforma de negociação em desenvolvimento para execução de estratégias que possa ser personalizada no nível de código fonte.
Procurando por uma plataforma de negociação em desenvolvimento para execução de estratégias que possa ser personalizada no nível de código fonte.

QuantStart.
Junte-se ao portal de membros privados da Quantcademy que atende à comunidade de comerciantes de varejo de varejo em rápido crescimento. Você encontrará um grupo bem informado de mentalistas quant pronto para responder suas perguntas comerciais mais importantes.
Confira meu ebook sobre o comércio de quant, onde eu ensino você como criar estratégias de negociação sistemáticas lucrativas com ferramentas Python, desde o início.
Dê uma olhada no meu novo ebook sobre estratégias de negociação avançadas usando análise de séries temporais, aprendizado de máquina e estatísticas bayesianas, com Python e R.
Por Michael Halls-Moore em 26 de julho de 2013.
Uma das perguntas mais freqüentes que recebo no QS mailbag é "Qual é a melhor linguagem de programação para negociação algorítmica?". A resposta curta é que não existe um "melhor" idioma. Parâmetros de estratégia, desempenho, modularidade, desenvolvimento, resiliência e custo devem ser considerados. Este artigo descreve os componentes necessários de uma arquitetura de sistema de negociação algorítmica e como as decisões relativas à implementação afetam a escolha do idioma.
Em primeiro lugar, serão considerados os principais componentes de um sistema de negociação algorítmico, como ferramentas de pesquisa, otimizador de portfólio, gerenciador de riscos e motor de execução. Posteriormente, serão examinadas diferentes estratégias de negociação e como elas afetam o design do sistema. Em particular, a freqüência de negociação e o provável volume de negociação serão discutidos.
Uma vez que a estratégia de negociação foi selecionada, é necessário arquitetar todo o sistema. Isso inclui a escolha de hardware, o (s) sistema (s) operacional (is) e a resiliência do sistema contra eventos raros e potencialmente catastróficos. Enquanto a arquitetura está sendo considerada, deve-se ter em conta o desempenho, tanto para as ferramentas de pesquisa quanto para o ambiente de execução ao vivo.
Qual é o sistema de comércio tentando fazer?
Antes de decidir sobre o "melhor" idioma com o qual escrever um sistema de negociação automatizado, é necessário definir os requisitos. O sistema será puramente baseado em execução? O sistema exigirá um módulo de gerenciamento de risco ou construção de portfólio? O sistema exigirá um backtester de alto desempenho? Para a maioria das estratégias, o sistema comercial pode ser dividido em duas categorias: Pesquisa e geração de sinal.
A pesquisa está preocupada com a avaliação de um desempenho de estratégia em relação aos dados históricos. O processo de avaliação de uma estratégia de negociação em relação aos dados anteriores do mercado é conhecido como backtesting. O tamanho dos dados e a complexidade algorítmica terão um grande impacto na intensidade computacional do backtester. A velocidade da CPU e a concorrência são muitas vezes os fatores limitantes na otimização da velocidade de execução da pesquisa.
A geração de sinal está preocupada com a geração de um conjunto de sinais de negociação a partir de um algoritmo e envio de ordens para o mercado, geralmente através de uma corretora. Para determinadas estratégias, é necessário um alto nível de desempenho. As questões de E / S, como a largura de banda da rede e a latência, muitas vezes são fatores limitantes na otimização de sistemas de execução. Assim, a escolha de idiomas para cada componente de todo o seu sistema pode ser bastante diferente.
Tipo, Frequência e Volume de Estratégia.
O tipo de estratégia algorítmica empregada terá um impacto substancial no design do sistema. Será necessário considerar os mercados comercializados, a conectividade com os fornecedores de dados externos, a freqüência e o volume da estratégia, o trade-off entre facilidade de desenvolvimento e otimização de desempenho, bem como qualquer hardware personalizado, incluindo customizado servidores, GPUs ou FPGAs que possam ser necessários.
As opções de tecnologia para uma estratégia de ações de baixa freqüência dos EUA serão muito diferentes das de uma negociação de estratégias de arbitragem estatística de alta freqüência no mercado de futuros. Antes da escolha do idioma, muitos fornecedores de dados devem ser avaliados que pertencem à estratégia em questão.
Será necessário considerar a conectividade com o fornecedor, a estrutura de todas as APIs, a pontualidade dos dados, os requisitos de armazenamento e a resiliência em face de um fornecedor que está offline. Também é aconselhável possuir acesso rápido a vários fornecedores! Vários instrumentos têm todos os seus peculiaridades de armazenamento, exemplos dos quais incluem símbolos de ticker múltiplos para ações e datas de vencimento para futuros (sem mencionar nenhum dado OTC específico). Isso precisa ser incorporado ao design da plataforma.
A frequência da estratégia provavelmente será um dos maiores drivers de como a pilha de tecnologia será definida. Estratégias que empregam dados com mais freqüência do que minuciosamente ou em segundo lugar, exigem uma consideração significativa em relação ao desempenho.
Uma estratégia que excede as barras segundo (isto é, dados de marca) leva a um design orientado a desempenho como o principal requisito. Para estratégias de alta freqüência, uma quantidade substancial de dados do mercado precisará ser armazenada e avaliada. Software como HDF5 ou kdb + é comumente usado para essas funções.
Para processar os extensos volumes de dados necessários para aplicações HFT, um sistema de backtester e execução extensivamente otimizado deve ser usado. C / C ++ (possivelmente com algum montador) é provável para o candidato a linguagem mais forte. As estratégias de ultra-alta freqüência certamente exigirão hardware personalizado, como FPGAs, co-localização de troca e ajuste de interface de rede / kernal.
Sistemas de pesquisa.
Os sistemas de pesquisa geralmente envolvem uma mistura de desenvolvimento interativo e script automatizado. O primeiro geralmente ocorre dentro de um IDE, como Visual Studio, MatLab ou R Studio. O último envolve cálculos numéricos extensos em vários parâmetros e pontos de dados. Isso leva a uma escolha de idioma que fornece um ambiente direto para testar código, mas também fornece desempenho suficiente para avaliar estratégias em várias dimensões de parâmetros.
Os IDE típicos neste espaço incluem Microsoft Visual C ++ / C #, que contém extensos utilitários de depuração, recursos de conclusão de código (via "Intellisense") e visões gerais diretas de toda a pilha do projeto (via o banco de dados ORM, LINQ); MatLab, que é projetado para uma grande variedade de álgebras lineares numéricas e operações vetoriais, mas de uma forma de console interativo; R Studio, que envolve o console de linguagem estatística R em um IDE de pleno direito; Eclipse IDE para Linux Java e C ++; e IDE semi-proprietários, como Enthought Canopy para Python, que incluem bibliotecas de análise de dados, como NumPy, SciPy, scikit-learn e pandas em um único ambiente interativo (console).
Para backtesting numérico, todos os idiomas acima são adequados, embora não seja necessário utilizar uma GUI / IDE, pois o código será executado "em segundo plano". A principal consideração nesta fase é a velocidade de execução. Um idioma compilado (como C ++) geralmente é útil se as dimensões do parâmetro backtest forem grandes. Lembre-se de que é necessário desconfiar de tais sistemas se for esse o caso!
Idiomas interpretados, como Python, muitas vezes fazem uso de bibliotecas de alto desempenho, como NumPy / pandas para a etapa de teste, para manter um grau razoável de competitividade com equivalentes compilados. Em última análise, o idioma escolhido para o backtesting será determinado por necessidades algorítmicas específicas, bem como o intervalo de bibliotecas disponíveis no idioma (mais sobre isso abaixo). No entanto, o idioma utilizado para o backtester e os ambientes de pesquisa podem ser completamente independentes dos usados na construção de portfólio, gerenciamento de riscos e componentes de execução, como será visto.
Construção de carteiras e gerenciamento de riscos.
A construção do portfólio e os componentes de gerenciamento de riscos são muitas vezes ignorados pelos comerciantes algorítmicos de varejo. Isso é quase sempre um erro. Essas ferramentas fornecem o mecanismo pelo qual o capital será preservado. Eles não só tentam aliviar o número de apostas "arriscadas", mas também minimizam o churn dos próprios negócios, reduzindo os custos de transação.
Versões sofisticadas desses componentes podem ter um efeito significativo na qualidade e consistência da lucratividade. É direto criar um estável de estratégias, pois o mecanismo de construção do portfólio e o gerenciador de riscos podem ser facilmente modificados para lidar com múltiplos sistemas. Assim, eles devem ser considerados componentes essenciais no início do projeto de um sistema de comércio algorítmico.
O trabalho do sistema de construção de carteiras é levar um conjunto de trades desejados e produzir o conjunto de negócios reais que minimizam o churn, manter exposições a vários fatores (como setores, classes de ativos, volatilidade, etc.) e otimizar a alocação de capital para vários estratégias em um portfólio.
A construção do portfólio geralmente se reduz a um problema de álgebra linear (como uma fatoração da matriz) e, portanto, o desempenho é altamente dependente da eficácia da implementação de álgebra linear numérica disponível. As bibliotecas comuns incluem uBLAS, LAPACK e NAG para C ++. O MatLab também possui operações de matriz amplamente otimizadas. Python utiliza NumPy / SciPy para tais cálculos. Um portfólio freqüentemente reequilibrado exigirá uma biblioteca de matriz compilada (e bem otimizada!) Para levar a cabo esta etapa, de modo a não engarrafar o sistema de negociação.
O gerenciamento de riscos é outra parte extremamente importante de um sistema de comércio algorítmico. O risco pode vir de várias formas: aumento da volatilidade (embora isso possa ser visto como desejável para certas estratégias!), Aumento de correlações entre classes de ativos, contraparte padrão, interrupções do servidor, eventos de "cisnes negros" e erros não detectados no código comercial, para nomear alguns.
Os componentes de gerenciamento de risco tentam antecipar os efeitos da volatilidade excessiva e a correlação entre as classes de ativos e seus efeitos (s) subsequentes sobre o capital de negociação. Muitas vezes isso se reduz a um conjunto de cálculos estatísticos, como Monte Carlo "testes de estresse". Isso é muito semelhante às necessidades computacionais de um mecanismo de preços de derivativos e, como tal, será vinculado à CPU. Essas simulações são altamente paralelizáveis (veja abaixo) e, até certo ponto, é possível "lançar hardware no problema".
Sistemas de Execução.
O trabalho do sistema de execução é receber sinais de negociação filtrados dos componentes de construção de portfólio e gerenciamento de riscos e enviá-los para uma corretora ou outros meios de acesso ao mercado. Para a maioria das estratégias de negociação algorítmica de varejo, isso envolve uma conexão API ou FIX para uma corretora, como Interactive Brokers. As considerações primárias ao decidir sobre um idioma incluem a qualidade da API, a disponibilidade do idioma para uma API, a freqüência de execução e o deslizamento antecipado.
A "qualidade" da API refere-se ao quão bem documentado é, qual o tipo de desempenho que ele fornece, se ele precisa de um software autônomo para ser acessado ou se um gateway pode ser estabelecido de forma sem cabeça (ou seja, sem GUI). No caso dos Interactive Brokers, a ferramenta Trader WorkStation precisa ser executada em um ambiente GUI para acessar sua API. Uma vez, tive que instalar uma edição do Desktop Ubuntu em um servidor de nuvem da Amazon para acessar os corretores interativos de forma remota, apenas por esse motivo!
A maioria das APIs fornecerá uma interface C ++ e / ou Java. Geralmente, é de responsabilidade da comunidade desenvolver wrappers específicos do idioma para C #, Python, R, Excel e MatLab. Note-se que, com cada plugin adicional utilizado (especialmente os wrappers da API), há possibilidades de insetos no sistema. Sempre testar plugins desse tipo e garantir que eles sejam ativamente mantidos. Um indicador valioso é ver quantas novas atualizações de uma base de código foram feitas nos últimos meses.
A frequência de execução é de extrema importância no algoritmo de execução. Note que centenas de pedidos podem ser enviados a cada minuto e, como tal, o desempenho é crítico. Slippage será incorrido através de um sistema de execução mal executado e isso terá um impacto dramático sobre a rentabilidade.
Os idiomas estaticamente digitados (veja abaixo), como C ++ / Java, geralmente são ótimos para execução, mas há um trade-off em tempo de desenvolvimento, testes e facilidade de manutenção. Idiomas dinamicamente digitados, como Python e Perl, geralmente são geralmente "rápidos o suficiente". Certifique-se sempre de que os componentes foram projetados de forma modular (veja abaixo) para que eles possam ser "trocados" à medida que o sistema se reduz.
Processo de Planejamento e Desenvolvimento Arquitetônico.
Os componentes de um sistema de comércio, seus requisitos de freqüência e volume foram discutidos acima, mas a infraestrutura do sistema ainda não foi coberta. Aqueles que atuam como comerciante de varejo ou que trabalham em um fundo pequeno provavelmente estarão "vestindo muitos chapéus". Será necessário cobrir o modelo alfa, o gerenciamento de riscos e os parâmetros de execução, bem como a implementação final do sistema. Antes de aprofundar linguagens específicas, o design de uma arquitetura de sistema ideal será discutido.
Separação de preocupações.
Uma das decisões mais importantes que devem ser tomadas no início é como "separar as preocupações" de um sistema comercial. No desenvolvimento de software, isso significa essencialmente como dividir os diferentes aspectos do sistema de negociação em componentes modulares separados.
Ao expor as interfaces em cada um dos componentes, é fácil trocar partes do sistema por outras versões que ajudem o desempenho, confiabilidade ou manutenção, sem modificar nenhum código de dependência externo. Esta é a "melhor prática" para esses sistemas. Para estratégias em frequências mais baixas, tais práticas são aconselhadas. Para a negociação de alta freqüência, o livro de regras pode ser ignorado à custa de ajustar o sistema para ainda mais desempenho. Um sistema mais acoplado pode ser desejável.
Criar um mapa de componentes de um sistema de negociação algorítmico vale um artigo em si. No entanto, uma abordagem ótima é garantir que haja componentes separados para as entradas de dados de mercado históricos e em tempo real, armazenamento de dados, API de acesso a dados, backtester, parâmetros de estratégia, construção de portfólio, gerenciamento de riscos e sistemas de execução automatizada.
Por exemplo, se o armazenamento de dados em uso estiver atualmente com desempenho inferior, mesmo em níveis significativos de otimização, ele pode ser trocado com reescrituras mínimas para a ingesta de dados ou API de acesso a dados. Até o ponto em que o backtester e os componentes subsequentes estão em causa, não há diferença.
Outro benefício de componentes separados é que permite que uma variedade de linguagens de programação sejam usadas no sistema geral. Não é necessário restringir a um único idioma se o método de comunicação dos componentes for independente de linguagem. Este será o caso se estiverem se comunicando via TCP / IP, ZeroMQ ou algum outro protocolo independente de linguagem.
Como um exemplo concreto, considere o caso de um sistema de backtesting que está sendo escrito em C ++ para o desempenho do "crunching", enquanto o gerenciador de portfólio e os sistemas de execução são escritos em Python usando SciPy e IBPy.
Considerações sobre o desempenho.
O desempenho é uma consideração significativa para a maioria das estratégias comerciais. Para estratégias de maior freqüência, é o fator mais importante. O "Desempenho" cobre uma ampla gama de problemas, como velocidade de execução algorítmica, latência de rede, largura de banda, E / S de dados, simultaneidade / paralelismo e dimensionamento. Cada uma dessas áreas é coberta individualmente por grandes livros didáticos, portanto este artigo apenas arranhará a superfície de cada tópico. A escolha da arquitetura e da linguagem agora será discutida em termos de seus efeitos sobre o desempenho.
A sabedoria prevalecente, como afirmou Donald Knuth, um dos pais da Ciência da Computação, é que "a otimização prematura é a raiz de todo o mal". Este é quase sempre o caso - exceto quando se forma um algoritmo de negociação de alta freqüência! Para aqueles que estão interessados em estratégias de baixa freqüência, uma abordagem comum é construir um sistema da maneira mais simples possível e apenas otimizar à medida que os estrangulamentos começam a aparecer.
Ferramentas de perfil são usadas para determinar onde surgem os estrangulamentos. Perfis podem ser feitos para todos os fatores listados acima, em um ambiente MS Windows ou Linux. Existem muitas ferramentas de sistema operacional e de idioma disponíveis para isso, bem como utilitários de terceiros. A escolha da linguagem agora será discutida no contexto da performance.
C ++, Java, Python, R e MatLab contêm bibliotecas de alto desempenho (como parte do padrão ou externo) para estrutura básica de dados e trabalho algorítmico. C ++ é fornecido com a Biblioteca de modelos padrão, enquanto o Python contém NumPy / SciPy. Tarefas matemáticas comuns são encontradas nessas bibliotecas e raramente é benéfico escrever uma nova implementação.
Uma exceção é se uma arquitetura de hardware altamente personalizada é necessária e um algoritmo está fazendo uso extensivo de extensões proprietárias (como caches personalizados). No entanto, muitas vezes a "reinvenção da roda" desperdiça o tempo que pode ser melhor gasto no desenvolvimento e otimização de outras partes da infra-estrutura de negociação. O tempo de desenvolvimento é extremamente precioso especialmente no contexto dos únicos desenvolvedores.
A latência é muitas vezes uma questão do sistema de execução, pois as ferramentas de pesquisa geralmente estão localizadas na mesma máquina. Para o primeiro, a latência pode ocorrer em vários pontos ao longo do caminho de execução. Os bancos de dados devem ser consultados (latência de disco / rede), os sinais devem ser gerados (sistema operacional, latência de mensagens do kernal), sinais comerciais enviados (latência NIC) e pedidos processados (latência interna dos sistemas de troca).
Para operações de maior freqüência, é necessário familiarizar-se intimamente com a otimização do kernal, além de otimizar a transmissão da rede. Esta é uma área profunda e está significativamente além do escopo do artigo, mas se um algoritmo UHFT é desejado então esteja ciente da profundidade do conhecimento necessário!
O cache é muito útil no conjunto de ferramentas de um desenvolvedor de negócios quantitativo. O armazenamento em cache refere-se ao conceito de armazenar dados freqüentemente acessados de uma maneira que permita um acesso de alto desempenho, em detrimento do potencial estancamento dos dados. Um caso de uso comum ocorre no desenvolvimento da web ao tirar dados de um banco de dados relacional com respaldo de disco e colocá-lo na memória. Quaisquer pedidos subseqüentes para os dados não precisam "acessar o banco de dados" e, portanto, os ganhos de desempenho podem ser significativos.
Para situações de negociação, o cache pode ser extremamente benéfico. Por exemplo, o estado atual de um portfólio de estratégia pode ser armazenado em um cache até ser reequilibrado, de modo que a lista não precisa ser regenerada em cada ciclo do algoritmo de negociação. Essa regeneração provavelmente será uma alta CPU ou operação de E / S de disco.
No entanto, o armazenamento em cache não está sem os seus próprios problemas. A regeneração de dados de cache de uma só vez, devido à natureza volátil do armazenamento de cache, pode colocar uma demanda significativa na infraestrutura. Outra questão é o empilhamento de cães, onde múltiplas gerações de uma nova cópia de cache são realizadas sob uma carga extremamente alta, o que leva a uma falha em cascata.
A alocação de memória dinâmica é uma operação cara na execução de software. Assim, é imperativo que os aplicativos de maior desempenho comercial sejam conscientes de como a memória está sendo alocada e desalocada durante o fluxo do programa. Novos padrões de linguagem, como Java, C # e Python, todos executam a coleta automática de lixo, que se refere à desalocação da memória alocada dinamicamente quando os objetos ficam fora do escopo.
A coleta de lixo é extremamente útil durante o desenvolvimento, pois reduz erros e ajuda a legibilidade. No entanto, muitas vezes é sub óptimo para certas estratégias de negociação de alta freqüência. A coleta de lixo personalizada é muitas vezes desejada para esses casos. Em Java, por exemplo, ao ajustar a configuração do coletor de lixo e do heap, é possível obter alto desempenho para as estratégias de HFT.
C ++ não fornece um coletor de lixo nativo e, portanto, é necessário lidar com toda a alocação / desalocação de memória como parte da implementação de um objeto. Embora potencialmente propenso a erros (potencialmente levando a ponteiros pendurados), é extremamente útil ter um controle fino de como os objetos aparecem no heap para determinadas aplicações. Ao escolher um idioma, certifique-se de estudar como funciona o coletor de lixo e se ele pode ser modificado para otimizar um caso de uso específico.
Muitas operações em sistemas de negociação algorítmica são favoráveis à paralelização. Isso se refere ao conceito de realização de múltiplas operações programáticas ao mesmo tempo, ou seja, em "paralelo". Os algoritmos denominados "embarassingly paralelos" incluem etapas que podem ser computadas totalmente independentemente de outras etapas. Certas operações estatísticas, como as simulações de Monte Carlo, são um bom exemplo de algoritmos embarazosa paralelos, pois cada sorteio aleatório e subsequente operação do caminho podem ser computados sem o conhecimento de outros caminhos.
Outros algoritmos são apenas parcialmente paralelizados. As simulações de dinâmica de fluidos são um exemplo, onde o domínio da computação pode ser subdividido, mas, em última instância, esses domínios devem se comunicar entre si e, portanto, as operações são parcialmente seqüenciais. Os algoritmos paralisáveis estão sujeitos à Lei de Amdahl, que fornece um limite superior teórico para o aumento de desempenho de um algoritmo paralelizado quando sujeito a processos separados em $ N $ (por exemplo, em um núcleo ou fio de CPU).
A paralelização tornou-se cada vez mais importante como um meio de otimização, uma vez que as velocidades do clock do processador estagnaram, já que os processadores mais novos contêm muitos núcleos com os quais realizar cálculos paralelos. O aumento do hardware de gráficos de consumo (predominantemente para videogames) levou ao desenvolvimento de Unidades de processamento gráfico (GPUs), que contém centenas de "núcleos" para operações altamente concorrentes. Tais GPUs são agora muito acessíveis. Os quadros de alto nível, como o CUDA da Nvidia, levaram a uma adoção generalizada na academia e nas finanças.
Esse hardware de GPU geralmente é apenas adequado para o aspecto de pesquisa de financiamento quantitativo, enquanto que outros equipamentos mais especializados (incluindo matrizes de portas programáveis em campo - FPGAs) são usados para (U) HFT. Atualmente, a maioria dos langauges modernos suporta um grau de concorrência / multithreading. Assim, é direto otimizar um backtester, pois todos os cálculos são geralmente independentes dos outros.
O dimensionamento em engenharia e operações de software refere-se à capacidade do sistema de lidar consistentemente com o aumento de cargas sob a forma de solicitações maiores, maior uso do processador e maior alocação de memória. Na negociação algorítmica, uma estratégia pode escalar se pode aceitar quantidades maiores de capital e ainda produzir retornos consistentes. A pilha de tecnologia de negociação escala se pode suportar maiores volumes de comércio e latência aumentada, sem bloqueio de estrangulamento.
Enquanto os sistemas devem ser projetados para dimensionar, muitas vezes é difícil prever de antemão, onde um gargalo irá ocorrer. O registro, o teste, o perfil e o monitoramento rigorosos ajudarão grandemente em permitir que um sistema seja dimensionado. As próprias línguas são muitas vezes descritas como "inesquecíveis". Isso geralmente é o resultado de uma informação errônea, e não de um fato difícil. It is the total technology stack that should be ascertained for scalability, not the language. Clearly certain languages have greater performance than others in particular use cases, but one language is never "better" than another in every sense.
One means of managing scale is to separate concerns, as stated above. In order to further introduce the ability to handle "spikes" in the system (i. e. sudden volatility which triggers a raft of trades), it is useful to create a "message queuing architecture". This simply means placing a message queue system between components so that orders are "stacked up" if a certain component is unable to process many requests.
Rather than requests being lost they are simply kept in a stack until the message is handled. This is particularly useful for sending trades to an execution engine. If the engine is suffering under heavy latency then it will back up trades. A queue between the trade signal generator and the execution API will alleviate this issue at the expense of potential trade slippage. A well-respected open source message queue broker is RabbitMQ.
Hardware and Operating Systems.
The hardware running your strategy can have a significant impact on the profitability of your algorithm. This is not an issue restricted to high frequency traders either. A poor choice in hardware and operating system can lead to a machine crash or reboot at the most inopportune moment. Thus it is necessary to consider where your application will reside. The choice is generally between a personal desktop machine, a remote server, a "cloud" provider or an exchange co-located server.
Desktop machines are simple to install and administer, especially with newer user friendly operating systems such as Windows 7/8, Mac OSX and Ubuntu. Desktop systems do possess some significant drawbacks, however. The foremost is that the versions of operating systems designed for desktop machines are likely to require reboots/patching (and often at the worst of times!). They also use up more computational resources by the virtue of requiring a graphical user interface (GUI).
Utilising hardware in a home (or local office) environment can lead to internet connectivity and power uptime problems. The main benefit of a desktop system is that significant computational horsepower can be purchased for the fraction of the cost of a remote dedicated server (or cloud based system) of comparable speed.
A dedicated server or cloud-based machine, while often more expensive than a desktop option, allows for more significant redundancy infrastructure, such as automated data backups, the ability to more straightforwardly ensure uptime and remote monitoring. They are harder to administer since they require the ability to use remote login capabilities of the operating system.
In Windows this is generally via the GUI Remote Desktop Protocol (RDP). In Unix-based systems the command-line Secure SHell (SSH) is used. Unix-based server infrastructure is almost always command-line based which immediately renders GUI-based programming tools (such as MatLab or Excel) to be unusable.
A co-located server, as the phrase is used in the capital markets, is simply a dedicated server that resides within an exchange in order to reduce latency of the trading algorithm. This is absolutely necessary for certain high frequency trading strategies, which rely on low latency in order to generate alpha.
The final aspect to hardware choice and the choice of programming language is platform-independence. Is there a need for the code to run across multiple different operating systems? Is the code designed to be run on a particular type of processor architecture, such as the Intel x86/x64 or will it be possible to execute on RISC processors such as those manufactured by ARM? These issues will be highly dependent upon the frequency and type of strategy being implemented.
Resilience and Testing.
One of the best ways to lose a lot of money on algorithmic trading is to create a system with no resiliency . This refers to the durability of the sytem when subject to rare events, such as brokerage bankruptcies, sudden excess volatility, region-wide downtime for a cloud server provider or the accidental deletion of an entire trading database. Years of profits can be eliminated within seconds with a poorly-designed architecture. It is absolutely essential to consider issues such as debuggng, testing, logging, backups, high-availability and monitoring as core components of your system.
It is likely that in any reasonably complicated custom quantitative trading application at least 50% of development time will be spent on debugging, testing and maintenance.
Nearly all programming languages either ship with an associated debugger or possess well-respected third-party alternatives. In essence, a debugger allows execution of a program with insertion of arbitrary break points in the code path, which temporarily halt execution in order to investigate the state of the system. The main benefit of debugging is that it is possible to investigate the behaviour of code prior to a known crash point .
Debugging is an essential component in the toolbox for analysing programming errors. However, they are more widely used in compiled languages such as C++ or Java, as interpreted languages such as Python are often easier to debug due to fewer LOC and less verbose statements. Despite this tendency Python does ship with the pdb, which is a sophisticated debugging tool. The Microsoft Visual C++ IDE possesses extensive GUI debugging utilities, while for the command line Linux C++ programmer, the gdb debugger exists.
Testing in software development refers to the process of applying known parameters and results to specific functions, methods and objects within a codebase, in order to simulate behaviour and evaluate multiple code-paths, helping to ensure that a system behaves as it should. A more recent paradigm is known as Test Driven Development (TDD), where test code is developed against a specified interface with no implementation. Prior to the completion of the actual codebase all tests will fail. As code is written to "fill in the blanks", the tests will eventually all pass, at which point development should cease.
TDD requires extensive upfront specification design as well as a healthy degree of discipline in order to carry out successfully. In C++, Boost provides a unit testing framework. In Java, the JUnit library exists to fulfill the same purpose. Python also has the unittest module as part of the standard library. Many other languages possess unit testing frameworks and often there are multiple options.
In a production environment, sophisticated logging is absolutely essential. Logging refers to the process of outputting messages, with various degrees of severity, regarding execution behaviour of a system to a flat file or database. Logs are a "first line of attack" when hunting for unexpected program runtime behaviour. Unfortunately the shortcomings of a logging system tend only to be discovered after the fact! As with backups discussed below, a logging system should be given due consideration BEFORE a system is designed.
Both Microsoft Windows and Linux come with extensive system logging capability and programming languages tend to ship with standard logging libraries that cover most use cases. It is often wise to centralise logging information in order to analyse it at a later date, since it can often lead to ideas about improving performance or error reduction, which will almost certainly have a positive impact on your trading returns.
While logging of a system will provide information about what has transpired in the past, monitoring of an application will provide insight into what is happening right now . All aspects of the system should be considered for monitoring. System level metrics such as disk usage, available memory, network bandwidth and CPU usage provide basic load information.
Trading metrics such as abnormal prices/volume, sudden rapid drawdowns and account exposure for different sectors/markets should also be continuously monitored. Further, a threshold system should be instigated that provides notification when certain metrics are breached, elevating the notification method (email, SMS, automated phone call) depending upon the severity of the metric.
System monitoring is often the domain of the system administrator or operations manager. However, as a sole trading developer, these metrics must be established as part of the larger design. Many solutions for monitoring exist: proprietary, hosted and open source, which allow extensive customisation of metrics for a particular use case.
Backups and high availability should be prime concerns of a trading system. Consider the following two questions: 1) If an entire production database of market data and trading history was deleted (without backups) how would the research and execution algorithm be affected? 2) If the trading system suffers an outage for an extended period (with open positions) how would account equity and ongoing profitability be affected? The answers to both of these questions are often sobering!
It is imperative to put in place a system for backing up data and also for testing the restoration of such data. Many individuals do not test a restore strategy. If recovery from a crash has not been tested in a safe environment, what guarantees exist that restoration will be available at the worst possible moment?
Similarly, high availability needs to be "baked in from the start". Redundant infrastructure (even at additional expense) must always be considered, as the cost of downtime is likely to far outweigh the ongoing maintenance cost of such systems. I won't delve too deeply into this topic as it is a large area, but make sure it is one of the first considerations given to your trading system.
Choosing a Language.
Considerable detail has now been provided on the various factors that arise when developing a custom high-performance algorithmic trading system. The next stage is to discuss how programming languages are generally categorised.
Type Systems.
When choosing a language for a trading stack it is necessary to consider the type system . The languages which are of interest for algorithmic trading are either statically - or dynamically-typed . A statically-typed language performs checks of the types (e. g. integers, floats, custom classes etc) during the compilation process. Such languages include C++ and Java. A dynamically-typed language performs the majority of its type-checking at runtime. Such languages include Python, Perl and JavaScript.
For a highly numerical system such as an algorithmic trading engine, type-checking at compile time can be extremely beneficial, as it can eliminate many bugs that would otherwise lead to numerical errors. However, type-checking doesn't catch everything, and this is where exception handling comes in due to the necessity of having to handle unexpected operations. 'Dynamic' languages (i. e. those that are dynamically-typed) can often lead to run-time errors that would otherwise be caught with a compilation-time type-check. For this reason, the concept of TDD (see above) and unit testing arose which, when carried out correctly, often provides more safety than compile-time checking alone.
Another benefit of statically-typed languages is that the compiler is able to make many optimisations that are otherwise unavailable to the dynamically - typed language, simply because the type (and thus memory requirements) are known at compile-time. In fact, part of the inefficiency of many dynamically-typed languages stems from the fact that certain objects must be type-inspected at run-time and this carries a performance hit. Libraries for dynamic languages, such as NumPy/SciPy alleviate this issue due to enforcing a type within arrays.
Open Source or Proprietary?
One of the biggest choices available to an algorithmic trading developer is whether to use proprietary (commercial) or open source technologies. There are advantages and disadvantages to both approaches. It is necessary to consider how well a language is supported, the activity of the community surrounding a language, ease of installation and maintenance, quality of the documentation and any licensing/maintenance costs.
The Microsoft stack (including Visual C++, Visual C#) and MathWorks' MatLab are two of the larger proprietary choices for developing custom algorithmic trading software. Both tools have had significant "battle testing" in the financial space, with the former making up the predominant software stack for investment banking trading infrastructure and the latter being heavily used for quantitative trading research within investment funds.
Microsoft and MathWorks both provide extensive high quality documentation for their products. Further, the communities surrounding each tool are very large with active web forums for both. The software allows cohesive integration with multiple languages such as C++, C# and VB, as well as easy linkage to other Microsoft products such as the SQL Server database via LINQ. MatLab also has many plugins/libraries (some free, some commercial) for nearly any quantitative research domain.
There are also drawbacks. With either piece of software the costs are not insignificant for a lone trader (although Microsoft does provide entry-level version of Visual Studio for free). Microsoft tools "play well" with each other, but integrate less well with external code. Visual Studio must also be executed on Microsoft Windows, which is arguably far less performant than an equivalent Linux server which is optimally tuned.
MatLab also lacks a few key plugins such as a good wrapper around the Interactive Brokers API, one of the few brokers amenable to high-performance algorithmic trading. The main issue with proprietary products is the lack of availability of the source code. This means that if ultra performance is truly required, both of these tools will be far less attractive.
Open source tools have been industry grade for sometime. Much of the alternative asset space makes extensive use of open-source Linux, MySQL/PostgreSQL, Python, R, C++ and Java in high-performance production roles. However, they are far from restricted to this domain. Python and R, in particular, contain a wealth of extensive numerical libraries for performing nearly any type of data analysis imaginable, often at execution speeds comparable to compiled languages, with certain caveats.
The main benefit of using interpreted languages is the speed of development time. Python and R require far fewer lines of code (LOC) to achieve similar functionality, principally due to the extensive libraries. Further, they often allow interactive console based development, rapidly reducing the iterative development process.
Given that time as a developer is extremely valuable, and execution speed often less so (unless in the HFT space), it is worth giving extensive consideration to an open source technology stack. Python and R possess significant development communities and are extremely well supported, due to their popularity. Documentation is excellent and bugs (at least for core libraries) remain scarce.
Open source tools often suffer from a lack of a dedicated commercial support contract and run optimally on systems with less-forgiving user interfaces. A typical Linux server (such as Ubuntu) will often be fully command-line oriented. In addition, Python and R can be slow for certain execution tasks. There are mechanisms for integrating with C++ in order to improve execution speeds, but it requires some experience in multi-language programming.
While proprietary software is not immune from dependency/versioning issues it is far less common to have to deal with incorrect library versions in such environments. Open source operating systems such as Linux can be trickier to administer.
I will venture my personal opinion here and state that I build all of my trading tools with open source technologies. In particular I use: Ubuntu, MySQL, Python, C++ and R. The maturity, community size, ability to "dig deep" if problems occur and lower total cost ownership (TCO) far outweigh the simplicity of proprietary GUIs and easier installations. Having said that, Microsoft Visual Studio (especially for C++) is a fantastic Integrated Development Environment (IDE) which I would also highly recommend.
Batteries Included?
The header of this section refers to the "out of the box" capabilities of the language - what libraries does it contain and how good are they? This is where mature languages have an advantage over newer variants. C++, Java and Python all now possess extensive libraries for network programming, HTTP, operating system interaction, GUIs, regular expressions (regex), iteration and basic algorithms.
C++ is famed for its Standard Template Library (STL) which contains a wealth of high performance data structures and algorithms "for free". Python is known for being able to communicate with nearly any other type of system/protocol (especially the web), mostly through its own standard library. R has a wealth of statistical and econometric tools built in, while MatLab is extremely optimised for any numerical linear algebra code (which can be found in portfolio optimisation and derivatives pricing, for instance).
Outside of the standard libraries, C++ makes use of the Boost library, which fills in the "missing parts" of the standard library. In fact, many parts of Boost made it into the TR1 standard and subsequently are available in the C++11 spec, including native support for lambda expressions and concurrency.
Python has the high performance NumPy/SciPy/Pandas data analysis library combination, which has gained widespread acceptance for algorithmic trading research. Further, high-performance plugins exist for access to the main relational databases, such as MySQL++ (MySQL/C++), JDBC (Java/MatLab), MySQLdb (MySQL/Python) and psychopg2 (PostgreSQL/Python). Python can even communicate with R via the RPy plugin!
An often overlooked aspect of a trading system while in the initial research and design stage is the connectivity to a broker API. Most APIs natively support C++ and Java, but some also support C# and Python, either directly or with community-provided wrapper code to the C++ APIs. In particular, Interactive Brokers can be connected to via the IBPy plugin. If high-performance is required, brokerages will support the FIX protocol.
Conclusão.
As is now evident, the choice of programming language(s) for an algorithmic trading system is not straightforward and requires deep thought. The main considerations are performance, ease of development, resiliency and testing, separation of concerns, familiarity, maintenance, source code availability, licensing costs and maturity of libraries.
The benefit of a separated architecture is that it allows languages to be "plugged in" for different aspects of a trading stack, as and when requirements change. A trading system is an evolving tool and it is likely that any language choices will evolve along with it.
Apenas iniciando o comércio quantitativo?
3 razões para se inscrever para a lista de e-mails QuantStart:
1. Quant Trading Lessons.
Você terá acesso instantâneo a um curso de e-mail gratuito de 10 partes, repleto de sugestões e dicas para ajudá-lo a começar a negociação quantitativa!
2. Todo o conteúdo mais recente.
Todas as semanas, vou enviar-lhe um envoltório de todas as atividades no QuantStart para que você nunca mais perca uma postagem novamente.
Real, dicas de negociação viáveis, sem tonturas.

A plataforma de negociação mais profissional com código aberto de código aberto.
The M4 trading platform is a professional trading application, featuring real-time quote screens, charting, portfolio tracking, auto-trading, scripting, expert advisors, stock scanning, alerts, and other advanced features.
Compre vs Construa.
Você está pagando por uma inscrição em uma plataforma que você não possui? Are you worried there are critical software problems you can't solve because you don't have the source code?
Você está preocupado com o risco, o tempo e o dinheiro necessários para criar uma plataforma de negociação a partir do zero?
O M4 é um aplicativo comercial de etiqueta branca que vem com bibliotecas de programação e exemplos C # para modificar a aparência e a funcionalidade.
O que você deveria saber:
1. Comprar uma plataforma de negociação readymade, custom-built é caro.
2. Construir uma plataforma de negociação a partir do zero pode ser ainda mais caro.
3. O arrendamento de uma plataforma de negociação cria custos de comutação altos e, muitas vezes, inescapáveis, sem mencionar, pagamentos de royalties intermináveis.
4. É limitativo e perigoso ser negado o acesso ao código-fonte da sua plataforma de negociação.
5. However, using free, open-source code is even more dangerous (see our document).
Corretoras, talvez você esteja pagando por uma plataforma que você não possui. Ou, você está preocupado, seus concorrentes estão lançando novas versões de sua plataforma tão rapidamente que você não pode continuar?
Os comerciantes, talvez você esteja frustrado com a falta de flexibilidade e suporte com o seu software de negociação existente, fora da prateleira. Are its limited features inadequate for your trading style? Eles estão te segurando?
A plataforma de negociação M4.
A interface do usuário da frente está disponível em C #, que oferece uma configuração familiar para programadores experientes. O back-end intensivo da CPU, no entanto, está escrito em C ++ para o melhor desempenho possível. Back-end code includes charting features, technical analysis, and a scripting language.
Tudo sobre M4 é completamente customizável. All windows, menus, toolbars, charts, and features can be modified, enhanced, or removed with ease. Como você é fornecido com exemplos de código-fonte e documentação do desenvolvedor, você pode fazer suas próprias modificações ou pode contratar desenvolvedores para codificar suas especificações.
O M4 possui gráficos de vários tempos, janelas separadas para gráficos (para suportar vários monitores), recursos de negociação automática, um identificador de ciclo de tendência, recursos de inteligência artificial, reconhecimento de padrões e muito mais.
Configurações Múltiplas.
O M4 pode ser implantado sob configurações diferentes projetadas especificamente para várias aplicações, incluindo Comércio Profissional, Desenvolvimento de Estratégia Quant, Gerenciamento de Fundos e Educação.
Professional Trading Edition.
Projetado para comerciantes profissionais, esta versão apresenta a capacidade de trocar várias classes de ativos através de várias corretoras ou através do acesso direto ao mercado. Os comerciantes podem testar de volta e testar várias estratégias de negociação simultaneamente, as estratégias de negociação podem ser otimizadas usando algoritmos genéticos, além de comerciantes podem criar estratégias de auto-negociação de alta freqüência e muito mais.
Quant Strategy Development Edition.
Esta versão do M4 permite aos desenvolvedores da estratégia quantos criar estratégias de negociação avançadas usando a linguagem de programação R, C ++, TradeScript ou qualquer idioma como C # ou VB. Esta versão também possui uma biblioteca de função quant e recursos de teste avançados avançados, incluindo a capacidade de back-testar múltiplos bancos de dados de HTP petabyte através do servidor RMD.
Fund Management Edition.
A M4 Fund Management Edition possui todas as mesmas funcionalidades da Professional Trading Edition, além da capacidade de trocas para vários clientes em uma base única, ou através de uma troca de cópias de um para muitos. This version also features a CRM designed for fund managers, a reporting engine that generates client profit & loss reports, plus the ability to connect to any brokerage API or exchange.
Education Edition.
A M4 Education Edition permite aos educadores ensinar aos estudantes on-line suas estratégias e metodologias de negociação proprietárias através de uma aplicação personalizada, reduzindo assim a dependência e o custo associados aos feeds de dados comerciais e ao software padrão padrão, como o NinjaTrader & trade ;, TradeStation & trade ;, etc.
A Education Edition possui proteção de estratégia comercial através de criptografia dupla e geração de sinal do lado do servidor, de modo que os sistemas proprietários nunca podem ser quebrados ou pirateados. Esta versão também possui um webinar interno ao vivo com uma sala de bate-papo embutida que exige que os alunos "levantem a mão" clicando em um botão para fazer perguntas, além de muitos outros recursos específicos para a educação comercial.
Tal como acontece com todas as versões do M4, esta versão pode ser rotulada e personalizada em branco. Nós também fornecemos soluções completas completas de ponta do início ao fim. This version is available in desktop, web, and mobile formats.
Retail Brokerage Edition.
A M4 Retail Brokerage Edition foi projetada para corretoras varejistas grandes e pequenas, oferecendo ações, futuros, divisas, opções e outros tipos de ativos.
Como uma corretora de varejo, você provavelmente está pagando taxas exorbitantes por uma plataforma de negociação que você não possui tecnicamente. Ou talvez você tenha gastado dezenas, senão centenas de milhares de dólares, para construir sua própria plataforma, que não está apenas respondendo às suas expectativas, ainda está custando uma fortuna para continuar a desenvolver e manter.
Você não está sozinho. Brokerages em todo o mundo têm procurado uma melhor solução de plataforma de negociação.
A M4 Retail Brokerage Edition é a solução perfeita para qualquer corretora de varejo. Várias versões estão disponíveis para desktops (Windows e Mac), Web e aplicativos móveis (Apple e Android) com o código fonte completo, o que significa que não há taxas anuais!
M4 Forex MT4 & trade; Bridge Edition.
O M4 - Forex MT4 Bridge Edition permite que a M4 se conecte com os servidores MT4 para que as corretoras de Forex existentes com as licenças MT4 possam implantar aplicativos personalizados na área de trabalho, em toda a web e em dispositivos móveis, como iPhone, iPad e Android.
The MT4 Bridge Edition features ultra-fast 10ms trade execution with MT4 servers using our proprietary MT4 adapter library written in low level C++ code.
Os comerciantes podem visualizar seu histórico de negócios, posições e abrir pedidos de uma tela personalizável. As with all versions of M4, the MT4 Bridge Edition can be white labeled and is fully customizable. O código fonte completo está disponível em C #, C ++ e JavaScript, que suporta roteamento dinâmico de pedidos, cotações em tempo real e dados históricos. O melhor de tudo, o MT4 Bridge Edition não é um imitador ou clone de outra plataforma, permitindo que sua empresa se destaque oferecendo uma plataforma única e proprietária.
Any Brokerage - Any Data Feed.
O M4 pode ser configurado para funcionar com qualquer corretagem ou feed de dados. O M4 pode ser configurado para se conectar diretamente a uma troca, ou a eSignal, Interactive Brokers, TD Ameritrade, FXCM, GAIN Capital, Hotspot, Oanda ou a qualquer outra API.
High Performance.
Todos os processos intensivos em CPU no M4 são assíncronos, aproveitando ao máximo os processadores multi-core. O carregamento de dados, a formação de rede neural, o processamento de consultores especializados e outros recursos fazem uso pleno do design de programação assíncrona.
Também facilitamos a adição de recursos assíncronos personalizados através da nossa classe de modelo AsyncProcess.
A maioria das empresas deve preferir comprar para construir: se você criar seu próprio produto, existe um risco inaceitável. E se o resultado final for uma falha? M4 economiza milhares de horas em tempo de desenvolvimento. Isso se traduz em um tempo de mercado mais rápido, custos mais baixos e um ROI mais alto. O M4 oferece suporte total. Os desenvolvedores de software receberão suporte técnico, configuração e treinamento, atualizações de código-fonte e conselhos úteis ao longo da duração da sua assinatura de código-fonte. Talvez, o mais importante, você pode ganhar uma receita substancial com a M4 ao inscrever-se no nosso Programa de Resgistrador de Valor Adicionado.
Começar com M4>
Motor de Gráficos StockChartX.
Pedimos mais de 1.200 comerciantes que criaram recursos e indicadores técnicos que eles queriam no StockChartX. Havia muitos pedidos de recursos valiosos, e os adicionamos a todos.
O StockChartX possui gráficos em tempo real, tick-by-tick, com barras High-Low-Close, barras Open-High-Low-Close, gráficos de velas de 2D e 3D, Renko, Kagi, Three Line Break, Point & Figure, Candle-Volume Equi-Volume, Equi-Volume sombreado, castiçais Heikin Ashi, caixas Darvas e outros estilos de preço.
Você pode traçar dados de mercado em tempo real; insira os símbolos de compra, venda ou saída; inserir texto, linhas de tendência, imagens personalizadas, indicadores múltiplos e indicadores de sobreposição (compartilhar escalas); exibir gráficos com semi-log ou escala linear; imprimir gráficos; save charts as images; salvar / carregar gráficos como arquivos binários e muito mais.
StockChartX é a biblioteca original de gráficos C ++, usada por mais de 3.000.000 de comerciantes.
Indicadores de Análise Técnica.
O M4 possui mais de 80 indicadores técnicos populares que podem ser personalizados com parâmetros definidos pelo usuário. Nossos indicadores técnicos foram validados por seus autores sempre que possível, para que você possa ter certeza de que os cálculos estão corretos. É por isso que nossa biblioteca de indicadores técnicos ganhou inúmeros prêmios pela revista Futures e revista Stocks & Commodities. View a complete list of indicators here.
Reconhecimento de Padrões Gráficos.
O M4 possui um mecanismo de reconhecimento de padrões totalmente dinâmico e orientado por modelo para identificar Canais, Double Bottoms, Double Tops, Flags, Head & Shoulders, Pennants, Trend, Triangles, Triple Bottoms, Triple Tops, Wedges e outros padrões. Crie padrões personalizados usando o utilitário de designer de padrão fornecido.
Expert Advisors.
Desenvolva seus próprios consultores especializados ou selecione um dos muitos consultores especializados previamente definidos incluídos no banco de dados do sistema comercial.
Outras características.
1. Tela de cédula de buffer duplo com cartilhas de Thumbnail ao vivo.
2. Tela de Gerenciador de Portfólio e Entrada de Pedido (vinculável a qualquer corretora)
3. Tela de gráficos com análise técnica.
4. Advanced Chart Pattern Recognition Built into the Charting Screen.
5. Indicadores Técnicos da Rede Neural.
6. Consultores especializados e relatórios de consenso.
7. Back Testing via TradeScript.
8. Alertas em tempo real via TradeScript.
9. Digitalização de estoque através do TradeScript.
10. Importar / Exportar para / a partir do Excel, incluindo valores de indicadores.
11. Classe de adaptador de API de acesso direto direto à linha com suporte ao desenvolvimento.
12. Back-End Administrator Application to Generate License Keys, Send Instant Messages, Generate P&L Reports, and much more!
Entregáveis.
Código Fonte para o Código Fonte da Plataforma de Negociação Inteira para outros Componentes, Incluindo Gráficos, Indicadores Técnicos e muito mais. Nosso servidor de dados SuperWebSocket Nosso mecanismo de troca MyExchange Um administrador para chaves de teste Relatórios de contas Mensagens instantâneas Interface de gráficos móveis e muito, muito mais!
Recursos de bate-papo, notícias, compartilhamento de mídia e gráfico.
Suporte para desenvolvedores.
Nós fornecemos configuração e treinamento de desenvolvedores por meio de compartilhamento de área de trabalho, para que você possa executar a plataforma M4 imediatamente após a compra da licença. As atualizações de suporte técnico e código fonte são fornecidas por um ano e podem ser renovadas. Contacte-nos para começar hoje.
Direitos autorais e cópia; 2002-2018 por Modulus Global, Inc., todos os direitos reservados.

1 Part I – Background.
Traditionally, trading is done by manual operation, which requires a trader to open or close position by hand, or at least calling a broker to do so. Benjamin Graham once mentioned that many great investors with outstanding investment records always repeat that investor’s largest enemy is himself. Warren Buffett also said that a successful investor is one that has the right temperament and the right psychology. As we all know, manual trading is not only vulnerable to traders’ psychological and emotional fluctuation, but also very inefficient in terms of trading speed and convenience.
Due to the advance of computing technology, now almost all financial assets can be electronically traded. Automated trading system takes advantage of computers to develop and test strategies and to trade financial assets automatically. It can help novice traders to avoid emotional trading and also help experienced traders to make trading more efficient and systematic. It has been widely used in financial industry and become indispensable for many investors. On the other hand, automatic trading makes market more liquid and reduces trading cost accordingly.
In recent years, online trading platform also becomes a hot spot of financial engineering innovation. Many financial Technology companies, such as Quantopian, Quantconnect, Motif Investing, have raised considerable funds from Wall Street. Hedge funds like WorldQuant also provide online simulation and trading environment for individual traders. Some of these platforms are beautifully designed and very user friendly. But when you backtest your strategies, they are actually running on the servers, hence totally transparent to the company. To avoid the risk of exposing the strategies, it is safer to do research in local machine and trade through reliable brokers or DMA. In addition, in the online platforms, data are transferred in Internet with HTTP protocol, which may be OK for low frequency trading but not efficient or feasible for high frequency trading.
Sentosa is named after the most popular island resort in Singapore. The languages I used to write Sentosa includes C++, Python, R, Go and Javascript. The project is hosted at Quant365, where you can download source code and follow all the updates.
There are three subprojects in Sentosa:
Sentosa trading system is a multithread, message driven, highly scalable, high frequency automatic trading system. The latency can be as low as 100 milliseconds, dependent on the distance between you and trading venue servers. Currently, the trading venue is IB , so an IB account is required. With modular design, it can be extended easily to support other trading venues. The algorithm module can be written with any language supporting either nanomsg or websocket protocol. I have implemented language binding for Python, R for an illustration purpose. It is very easy to add other language support like Java, MATLAB, Haskell, Go, C# etc. The market data module subscribes to trade and quote(TAQ) data, so in some literature or book, Sentosa trading system should be categorized as technical automatic trading system, as a contrast with fundamental automatic trading system, where the system mainly uses fundamentals as trading signal. I don’t think this categorization makes much sense because signal is just a result of algorithm module and anything can be a signal: technical indicator, fundamental ratio, macroeconomic index, social media news, Google trends etc.
Sentosa research platform is essentially an interactive computing environment based on Jupyter. I will demonstrate how to use R and Python to do volatility research in the platform later.
In addition, I also developed a web platform for Sentosa with Django and Tornado, by which you can monitor Sentosa and send orders using web interface.
I used Sentosa to do research and trading for myself. Although it can be used for real trading, here I disclaim all the responsibilities of any loss of any trade through Sentosa. But if it had helped you make money, I don’t mind to be treated a cup of coffee. Sentosa is an ongoing project and more features will be added in the future. I will also discuss the future direction of each subproject.
2 Part II – Sentosa Trading System.
2.1 Design Overview.
When designing Sentosa trading system, my emphasis is on its configurability, modularity and scalability. In folder.
/.sentosa, there is a YAML-format configuration file named sentosa. yml , which you can use to customize the system. The only requirement is you need to set your own IB account in the global section for paper or real trading.
Sentosa trading system is mainly composed of five modules: market data module, OMS module, algorithm module, record module and simulation module. These modules are purposely decoupled and communications are all through messaging system. The trading system also has four running modes: record, trade, simulation and merlion, which represent different combination of the five modules.
Figure 1 is the program workflow graph of Sentosa trading system.
Workflow of Sentosa Trading System.
2.1.1 Running Mode.
Sentosa can be running at four modes which is define as follows:
Do not trade, just to record all the market information into a simulation file for future usage.
Launch all Sentosa modules and trade.
Replay historical scenario. This is to backtest your algorithm in a simulation environment.
merlion mode is the same as trade mode except that it does not generate simulation file. You cannot replay you current trading session as you have no simulation file generated.
The running mode can be configured in global section in sentosa. yml .
2.1.2 Multithreads and Messaging System.
Sentosa is a multithread application implemented with C++14 threads. All the threads are created in heap and the pointers are stored in a vector. Initially I developed Sentosa in Windows platform and used ZMQ as internal messaging protocol. But when I was trying to port it to Linux, ZMQ did not work well with threads in Linux. ZMQ created more than ten threads automatically and it messed up with IB’s threads somehow. I filed ZMQ bug report and so far it has yet been solved.
Nanomsg is created as a better alternative to ZMQ by the same author. It is simpler to use and has no such issue in multithread environment. I replaced all ZMQ code with nanomsg and chose nanomsg as my internal messaging protocol.
2.1.3 Modules.
With nanomsg as the internal messaging protocol, I decouple the system into five basic modules: market data module, order management system module, algorithm module, record module and simulation module. These modules coexist in one process but in different threads. They communicate with messaging system and can be turned off and on according to the four running modes described above. Modular design makes the system scalable and easier for future development.
The first three modules represent the three most basic components of an automatic trading system. In the following sections, I will describe these three modules one by one.
2.2 Market Data Module.
2.2.1 Introduction of Market Data.
Market data module is one of the most important components of a trading system. Generally, market data include tick level information about prices and size of bid, ask, completed trades. Different data vendors sometimes provide extra information like tag, exchange name. There are two levels of market data according to the information it provides.
Level 1 market data provide the most basic information, which includes bid/ask price and size, and the last traded price and size. From the order book point of view, these information are from the top of the book, so level 1 market data also known as top-of-book data.
Level 2 market data, also called order book or market depth , provide extra information of partial or whole order book. The order book has two long queues of bid and ask orders respectively. The queues cancel each other at the top and grow when new limit order comes in. The length of the queue is called the depth of order book. The order book changes very fast for liquid stocks so the information can be overwhelmingly huge.
Most individual traders use Level 1 market data. Level 2 market data are crucial for day traders, especially low latency high frequency traders. There are many academic researches on level 2 market data in recent years.
IB has its own way to deliver market data. Loosely speaking, IB provides both level 1 and level 2 market data. reqMktData is to request level 1 market data. reqMktDepth is to request level 2 market data. In addition to the raw data, IB also provides real time bar data via function reqRealTimeBars . The real time bar data, like the historical bar data, also provide open, high, close, low(OHCL) prices, volume weighted average price(VWAP) and trade count information.
Please be noted that IB doesn’t provide true tick level data . The market data are actually consolidated every 300 milliseconds or so and sent back to client upon request. As we are not doing ultra-low latency trading and not considering the tick level dynamics, a combination of level 1 data and 5 seconds real time bar data should be enough.
2.2.2 Threads.
In Sentosa trading system, market data module involves the following threads:
2.2.2.1 Thread_MKDataTick.
Thread_MKDataTick connects to IB to request two kinds of data:
IB’s tick level real time market data (by reqMktData) IB’s 5 seconds real time TRADE bar data (by reqRealTimeBars)
Upon data sent back from IB, data are sent to thread Thread_UpdateSboard to update scoreboard, a global data structure implemented as a singleton in scoreboard. h/cpp .
2.2.2.2 Thread_MKDepth.
Get level 2 market data by calling IB API ReqMkDepth() . TWS currently limits users to a maximum of 3 distinct market depth requests. This same restriction applies to API clients, however API clients may make multiple market depth requests for the same security. Due to this limitation, many algorithms involving order book dynamics cannot be used.
2.2.2.3 Thread_UpdateSboard.
This thread is to update scoreboard upon the market data message.
When Sentosa trading system is running at simulation mode, the market data can be from a simulation file, aka replay file.
2.3 Algorithm Module.
Sentosa trading system provides a framework for traders to write their strategies. This framework is called algorithm module. This module communicates with OMS module through messaging system. Not many traders are programming experts, but in order to implement their strategies, they know how to use programming languages to write trading algorithms. The most frequently used languages by traders include R, Matlab, Python and VBA(Excel). Sentosa trading system is a message driven system and designed with multiple languages support in mind. As long as one language supports nanomsg or websocket , it can be used to write trading algorithm.
Currently Senotsa supports algorithm module written in three languages, including C++, Python and R. These three languages represent three ways how algorithm module works in Sentosa.
Traders using C++ mostly have strong programming skills and higher requirement with trading system’s performance and speed. In Sentosa trading system, algorithm module is built into a static library and then used to generate the final executable binary.
All algorithms in Sentosa trading system inherit from an abstract base class AlgoEngine . Factory pattern is used to create algorithm objects:
In Sentosa configuration file sentosa. yml , there is a strategy section to specify you strategy name and trading universe. Take the following as an example:
It means there is a strategy called ta_indicator_raffles and the trading universe includes 10 stocks/ETFs(SINA, ATHM…FXI).
I name the strategy ta_indicator_raffles for an illustration purpose so that you can see this is a strategy using Technical Analysis . In real trading, traders normally give their strategies totally irrelevant names.
Technical analysis(TA) indicators are extremely popular with individual traders. They normally use it in low frequency trading. There are many rules of thumb for TA indicators, which are only applicable in low frequency trading environment. For high frequency trading, you may need to do some adjustment. Take RSI(Relative Strength Index), an extremely popular indicator developed by J. Welles Wilder Jr., as an example:
RSI is defined as.
\[ RSI = 100 - 100/(1 + RS)\] where \[ RS = Average Gain / Average Loss \]
According to Wilder, RSI is considered overbought when above 70 and oversold when below 30. If using 15 seconds bar data, for stocks trading not so frequently, RSI can become very high or low because there are many periods without price change. There are two solutions. The first one is to use more time periods so that Average Gain or Average Loss is not equal to 0. Another solution is to set RSI equal to 50 if the price changes are too few. In other words, the momentum is not obvious when there is no price change information, so we just give it a value of 50. The following is a C++ implementation of the second idea - if number of price changes is less than 10, just set RSI to 50.
Some TA indicators working well in low frequency trading do not work at all in high frequency trading. One reason is the market data, like TAQ, is not enough in high frequency, especially for assets with low liquidity. Another reason is that market noise is significant, sometimes dominant, in high frequency trading. Too much unpredicted factors will make the real price trend unclear. In this case, more research and backtesting are needed to find out what the real value of the trading asset is and after how long the noise will disappear.
There is a TA library called ta-lib written in C++ and also available in other languages like Python, Go. Sentosa includes a development version of ta-lib version 0.6.0dev. You can also download ta-lib version 0.4 from ta-lib, which is more stable but with less TA indicators.
2.3.2 Python.
Traders using Python do not have very high requirement on the execution speed and system performance. I developed a Python package called Pysentosa which uses nanomsg protocol to connect to market data module and websocket protocol to connect to OMS. A demo code is like the following:
This code demonstrates a simple algorithm:
Set a price range with lower bound equal to 220 and upper bound equal to 250. If SPY’s ask price is lower than 220, try to buy 50 shares. If the BUY order get filled, decrease the lower bound by 20, and wait to buy 50 shares until the ask price hit below 200. But if the bid price is greater than the upper bound value, send a SELL order of 100 shares SPY. If get filled, increase the upper bound by 20 and wait to sell until the bid price hit beyond the new upper bound value 270. This algorithm can be used to split big order for institutional traders.
Not only is Pysentosa a message interface of Sentosa, it includes a Sentosa trading system runtime. I use boost. python to wrap Sentosa trading system into a dynamic library and it will be run as a daemon when you create a Merlion object. In another words, Pysentosa is a complete full featured trading system.
In contrast with Pysentosa , I also developed rsentosa with R language, which is to demonstrate another way to use Sentosa. rsentosa is for traders using R language, who normally have strong statistics background. rsentosa use nanomsg protocol to communicate with both OMS and market data module. The demo code is as follows:
The algorithm is almost the same as the python version except it does not sell SPY no matter what bid price is.
2.4 Order Management System.
OMS(order management systems) is a software system to facilitate and manage the order execution, typically through the FIX protocol. In Sentosa, OMS module gets orders from Algorithm Module and send them to IB. IB gets order from Sentosa OMS and executes it using its smart routing technology. IB API supports two basic type of orders: Limit Order and Market Order .
Limit order has a price limit which guarantees the execution price cannot be worse than it. For every stock, exchange maintains a limit order book including all the bid/ask prices, volumes and timestamp information. Please be noted the trade price can be favorable than limit order price. For example, if you send a limit order of selling Google stock for 1 dollar per share, system will fill it with the bid price at the top of the book, which will be higher than 1 dollar.
A Market order itself has no price information. When a market order is sent out to an exchange, the order matching engine will find the currently available best price to execute it. Market order will normally be filled immediately by matching another limit order at the top of order book. You cannot match two market orders because there is no price information in market orders.
2.4.1 OMS Design and Messaging Protocol.
OMS accepts two type of protocols: nanomsg and websocket .
Thread Thread_API_NN will monitor and handle any incoming nanomsg message at port specified as NN_MON_PORT in sentosa. yml .
Thread Thread_API_WS will monitor and handle any incoming websocket message at port specified as WS_MON_PORT in sentosa. yml .
OMS handles two different protocols but with the same logic. I use C++ function overloading to handle the difference. The interface definition is at api_core. cpp and implementation is at api_nn. cpp for nanomsg and api_ws. cpp for websocket respectively.
Sentosa is a multithread application where there are four threads in OMS module:
In Sentosa, for performance consideration, system will preallocate a static array of orders with length of 283 for each instrument. In another words, one instrument can send at most 283 orders with different order id(order replacement is not counted in as the order id is the same). This number should be enough for individual traders. Sentosa OMS uses nanomsg as the communication protocol and receives nanomsg text as the instruction.
Sentosa OMS opened a NN_PAIR socket at the following endpoint:
You can customize the port by changing ALGO_TO_OMS_PORT at sentosa. yml .
The protocol specification is also customizable through sentosa. yml . Take the default ‘sentosa. yml’ configuration as an example:
To close all your current position with market order when a nanomsg text starting with “e” is received.
To close one instrument’s position as soon as possible. The nanomsg format is f|SYMBOL . For instance, “f|IBM” means to close your current IBM holding position with a market order .
To cancel all your current outstanding orders of one instrument. The nanomsg format is g|SYMOBL .
To send a limit order .
The format is l|SYMBOL|Quantity|Price|AllowedMove|OID , where:
Quantity is a signed integer. Positive sign means BUY and negative means SELL.
Price is the limit price.
AllowedMove is the price range in which the order is still considered valid. In Sentosa OMS, if the market price moves too far from the limit price, the order will be cancelled by OMS. The logic can be expressed with the following pseudo-code:
OID is the order id.
To send a market order . The format is m|SYMBOL|Quantity|OID .
To check the status of an order by order id. The message format is i|OID . For instance, “i|1634223” means a request to OMS to return the status of the order with id equal to 1634223. OMS will send one of the following order’s status to client with the format of “i|OID|ORDERSTATUS”. In case the order doesn’t exist at all, OMS will send back -1. If OMS send “i|1634223|4” back, it means the order with id equal to 1634223 has a status of SUBMITTED .
Order status are defined like the following:
You can refer to IB document for the details of order status:
2.5 Future Direction.
Sentosa trading system can be extended in several ways:
From multithread to multiprocess.
From single machine to cluster.
From IB to other trading venues, or direct market access(DMA) if possible.
More languages support.
More modules support - risk management module, portfolio management module.
3 Part III – Sentosa Research Platform.
3.1 Introduction.
Search Research Platform is a web-based interactive computing platform based on Jupyter with Python and R support. You can set it up in your local machine and do research with your data. The following is a screenshot:
Sentosa Research Platform.
In the following sections, I will discuss financial data selection, collection and management. Then I will showcase two research tasks using R and Python respectively. The first is GARCH-family volatility comparative study with low frequency data and the second is true volatility calculation with high frequency data.
3.2 Data Selection, Collection and Management.
In the first place, successful trading starts with good quality data. With good quality data, particularly quantitative data, trader can do meaningful research. For equity trading, some commonly used data types include trade data , quote data , fundamental data , macroeconomic data , risk factor data , news data , social media data , and option data . Daily OHLC trade data and some macroeconomic data are normally available for free. Others are mostly not free, some of which are expensive because of the information edge traders can get from them.
For the paid data services, you need to choose to pay for processed data or raw data, or both. Processed data(eg. PE/PB ratio) are more convenient and ready to be used directly. As for raw data(eg. tick and quote data), you need to write program to clean them, calculate indicator or risk factors with your own algorithm. Some may need heavily intense computation. But good thing for raw data is its flexibility and potential to provide a trader with more information edge.
Data can be stored in file system in plain text format. Many time series data are just some csv files, which can be very conveniently used by many languages. For big data series, database like MSSQL, MySQL and MongoDB can be used. Data are stored in tables or documents and indexes are created for faster query speed. For higher performance time series data processing, you can choose commercial database like KDB+, One Tick or eXtremeDB.
There are many commercial data vendors out there like Thomson Reuters, Bloomberg, but most of them are prohibitive for individuals. In this project, using MySQL as data storage and IB as data source, I developed a historical data collection tool called histData which I will describe as below.
3.2.1 Historical Data Collection Tool - histData.
In this project, I use four tables to store four time series data:
The table structure is the same for each table. For example, the following is the structure of table bar1d :
The following are three rows in table bar15s :
The first row means during 2013-Dec-06 09:30:00 to 2013-Dec-06 09:30:15, there are 8 trades occurred for BITA with WAP equal to 30.21, trading volume equal to 25K, open price equal to 30.27, highest price equal to 30.27, lowest price equal to 30.16 and close price equal to 30.25.
For stocks, historical data requests that use a bar size of “30 secs” or less can only go back six months. IB also has limitation in request rate, which requires no more than 60 historical data requests in any 10-minute period. Considering this limitation, I think IB should have used traffic control algorithm like token bucket in the server side. In client side, to avoid causing pacing violations, our data collector sleeps for 1 minute after sending 6 requests. This is customizable in configuration file sentosa. yml . The following is what I used in my configuration file:
If histDataSleepT is equal to 30000, histDataReqNum should be equal to 3, which means sleep 30 seconds per 3 requests. histDataBackMN means how many months from now backward you want to collect data. In the above example, if today is 2015-Dec-31, it means we want to collect data in period of 2015-Jul-01 to 2015-Dec-31.
As follows, I will showcase how to use Sentosa Research Platform to do quantitative research on volatility. Case 1 is about parametric models of volatility using low frequency data. Case 2 is about nonparametric models using high frequency data with market microstructure noise.
3.3 Case 1: Volatility Forecasting Comparative Study (R)
Volatility is so important that it is widely used in trading, pricing and risk management. Christian Brownlees, Rob Engle and Bryan Kelly published a paper called A Practical Guide to Volatility Forecasting Through Calm and Storm which concludes that model rankings are insensitive to forecast horizon .
To verify the conclusion of this paper, I plan to use Quandl library to get S&P 500 index data from 1950-Jan-03 to 2011-Mar-18 and use R program to compare 5 GARCH models: GARCH, NGARCH, TGARCH, APARCH, eGARCH.
In the 5 models, GARCH model fails to explain the asymmetry of the distribution of errors and the leverage effect. eGARCH and TGARCH are able to handle leverage effect where return has negative skewness. NGARCH and APARCH are able to handle leverage effect for both negative and positive skewness.
The code is written in R language as follows:
The code above defines a quasi-likelihood (QL) loss function proposed by the original paper, by which we can compare model’s predictability. Then it gets data from Quandl, defines model specifications, fits models and predicts with each model, and finally draws a graph with quasi-likelihood (QL) loss value. The out sample length is 50 days. The forecast horizons I have chosen are 1, 10, 20, 30, 40, 50 days. I will compare the five models’ predictability in these forecast horizons.
Assuming that the return distribution is normal, run the code above and I find when forecast horizon is equal to or less than 30:
When forecast horizon is greater than 30, no ranking pattern is observed.
The result is at Figure 3.
GARCH Family Models with Normal Distribution.
As we know, stock price return distribution is more aligned with student t distribution than normal. Now assuming the return distribution is student t distribution, in the code, we need to change the model specification from norm to std :
Run the code above and I find when forecast horizon is equal to or less than 30:
When forecast horizon is greater than 30, no ranking pattern is observed.
The result can be seen from figure 4:
GARCH Family Models with Student Distribution.
The result verifies the model ranking doesn’t change as the forecast horizon changes as long as the horizon is not too large. It can be explained by the characteristics of each model. For example, both TARCH and eGARCH consider positive skew leverage effect, so they have almost the same loss function value. NGARCH and APARCH can explain both positive and negative skewness, which is why it has a higher loss function value than TARCH and eGARCH.
The result also verifies another empirical knowledge that, compared with other GARCH-family models, GARCH model is good enough. When we use student distribution as the model distribution, GARCH model ranks number 1. When using normal distribution, GARCH ranks number 2. This is another example that the simplest model is the most powerful model.
3.4 Case 2: Volatility with High Frequency Data (Python)
3.4.1 Theory and Concept.
Assume stock price follows geometric Brownian motion: \[ S_t = S_0 \cdot exp(\sigma W_t + (\mu - \frac )\cdot t) \]
Then stock return \(R_i = log(S_ ) - log(S_ )\) is a normal distribution. In one unit of time \(0=t_0<t_1<t_2. <t_i=1\) , the sum of squared return \(R_i\) (aka. quadratic variation of \(R_i\) ) is:
So the definition of volatility in mathematical form is: \[\begin \sigma = \sqrt ^\infty [log(S_ / S_ >)] ^2 > \label \end \]
This volatility \(\sigma\) is called true volatility . \(\sigma^2\) is called true variance .
3.4.2 Market Microstructure Effects.
High-frequency data have some unique characteristics that do not appear in lower frequencies. There are several well known phenomenon like asynchronous trading, bid-ask bounce and minimum tick rules, which are called Market Microstructure Effects in finance literatures.
Figure is generated from BITA` compounded return time series with different sampling intervals: 1 minute, 1 hour and 1 day. In the distribution subplots, the red dashed line is the corresponding normal distribution. When interval length is 1 day, the distribution is a right skewed, leptokurtic bell curve. However, as the sampling frequency increases, the skewness decreases and kurtosis increases. When interval length is 1 minute, skewness becomes negative and kurtosis reaches as high as 45.5.
Market Microstructure Effects on Log Return.
This means the data statistic property has been changed when the sampling frequency increases. In high frequency data, the observed price is not the stock’s intrinsic price any more, but a trade price heavily distorted by market microstructure effects . Suppose the logarithm of a stock intrinsic/true price is a stochastic process \(P_t\) and observed trade price is \(Q_t\) .
I use \(P_t\) to represent a stochastic process which is unknown and equal to the logarithm of a stock intrinsic or true price, and \(Q_t\) is another stochastic process which equals to the logarithm of a stock’s trade price.
Where \(\epsilon_ \) is an i. i.d. noise process with \[ \begin E[\epsilon_ ] &= 0 \\ Var[\epsilon_ ] &= E[\epsilon_ ^2] = c\\ \end \]
Noise variance \(c\) is a constant in this model. It is not necessarily normal, but should be symmetric and weak stationary. Also, \(\epsilon_ \) is independent with \(P_t\) and \(Q_t\) .
3.4.3 Realized Volatility and Volatility Proxies.
Although we have a math formula for true volatility, we can never get its precise value. First, it is a continuous calculus form equation, but in the real world, the price is always discrete. Second, market microstructure effects, as described in previous section, also distort the price, making trade price not exactly the same as stock’s intrinsic price as defined in our model. In order to make the return data close to normal distribution, which is a basic assumption in many financial models, one has to sample the trade price at sufficiently wide interval to avoid market microstructure effects, and in turn this will make the price more discrete.
So we have to introduce another concept called realized volatility . It is essentially a discrete version of true volatility defined at equation \(\eqref \) . If we split the time unit \(T\) equally into \(N\) smaller time intervals \(t\) with equal length, we have the sampling frequency \(N\) :
and realized volatility is defined as:
and the realized variance is accordingly defined as:
Please be noted here \(Q\) is observed price, not true price \(S\) .
Realized volatility (aka integrated volatility ) is a bias estimator of true volatility due to market microstructure effects. I will prove this theoretically and empirically later. Correspondingly, the square of realized volatility is called realized variance , or integrated variance , or sometimes realized quadratic variation .
Please be noted, in some literatures, realized volatility and realized variance sometimes are used interchangeably. In addition, there are two other volatilities often seen in literatures. (1.) Implied volatility is just a numeric calculated from the option price according to Black-Scholes formula, assuming all the assumptions of Black-Scholes model are correct. (2.) Historical volatility normally means the past daily volatility calculated with historical data according to parametric conditional volatility models like GARCH, EWMA, or stochastic volatility models.
Because true volatility is not known, one can use volatility proxies when specifying and evaluating volatility models. We can consider proxy as a mapping of original variable in another space through a proxy function. In statistics, proxy is used for a variable not of prime interest itself, but is closely connected to an object of interest. One uses proxy to replace latent variables of interest, so the absolute correlation of proxy variable and original variable should be close to 1. Please be noted that one can use estimator, either biased or unbiased, as a proxy, but it is probably wrong to use a proxy as an estimator.
3.4.4 Market Microstructure Effects and Volatility Proxies.
Realized variance is often used as a volatility proxy when high frequency data are available. But surprisingly, due to market microstructure effects, we may get worse result when we have higher frequency data.
For the noise process, we have \[ E[\epsilon_ ]E[\epsilon_ ] = 0 \] because \(\epsilon_ \) and \(\epsilon_ \) are independent. And then.
The expectation is: \[\begin \begin E[\hat\sigma^2] &= E[\sum\limits_ ^N [ R_ > + ( \epsilon_ > - \epsilon_ >)] ^2 ] \\ &= E[\sum\limits_ ^N [ R_ > ^2 + 2R_ >( \epsilon_ > - \epsilon_ >) +( \epsilon_ > - \epsilon_ >)^2] ] \\ &= E[\sigma^2] + 2Nc \label \end \end \] The variance is: \[\begin \begin Var[\hat\sigma^2] &= 4 N E[\epsilon ^4] + O_p(1) \label \end \end \] This proves realized variance is a biased estimator of true volatility . The higher the sampling frequency is, the bigger N is, and the bigger the bias is. When N goes to infinity, the bias and realized variance go to infinity too. Zhang proposed that, when \(N\) is large enough, \(\sigma\) will become negligible, we can get the value of c, the variance of noise process with this formula: \[\begin c = \frac \label \end \]
Once we get the value of \(c\) , we can use the same equation to get \(E[\sigma^2]\) .
But how to decide if N is large enough? I am proposing another method. Resample the raw data with two steps \(N_1\) and \(N_2\) , and get two expectation of realized variance \(\hat E_1[\hat\sigma^2]\) and \(\hat E_2[\hat\sigma^2]\) . We have: \[ \hat E_1[\hat\sigma^2] = E[\sigma^2] + 2N_1c \\ \] \[ \hat E_2[\hat\sigma^2] = E[\sigma^2] + 2N_2c \]
3.4.5 Other Volatility Proxies.
Price range is a good volatility proxy which is free from the market microstructure effects. One definition is as simple as \(PR = Q_h - Q_l\) , where \(Q_h\) is the highest trade price in one time unit, \(Q_l\) is the lowest price accordingly.
The expectation of price range is: \[ \begin E[PR] &= E[Q_h - Q_l] \\ &= E[P_h - P_l + ( \epsilon_ - \epsilon_l)]\\ &= E[P_h - P_l] \end \]
We can see it is related to spread of true price in one time unit, but has nothing to do with \(\epsilon_t\) .
Another method to construct price range using high frequency data is to sum all subinterval price spreads in one time unit. To avoid confusion, if necessary, I will use price range(H-L) for the first definition and price range(sum of H-L) for the second one. By default, price range means the first definition.
In addition, people sometimes also use absolute return as volatility proxy. It is very similar to price range, but because the log return only consider the last close price and current close prices, it will miss information between the two time points, so it has a downward bias.
3.4.6 Realized Variance and Other Volatility Proxies.
Realized variance is a biased estimator, also a proxy, of real variance. First, let’s compare it with another well known volatility proxy price range. The raw data is 15 seconds OHLC bar data of BITA from IB. I choose 5 minutes as the time unit, so according to equation \(\eqref \) , with sampling interval number \(N\) equal to 20, we can get the value of realized variance. It is noteworthy that, for price range, I use the highest price in 5 minutes minus the lowest price, not sum of high minus low in 20 15-seconds-OHLC bars.
I randomly choose one day and compare these two variance proxies. The result is figure .
Realized Variance VS. Price Range(H-L) in one day.
The upper graph is the absolute value comparison. Because the value of realized variance is so small that it becomes a straight line closely above x axis. After multiplying a scale-up factor 180.6 to every number in realized variance series, I get the lower graph. It looks much better than the upper one. It is easy to see the two time series have the same trend. There is only very minor difference between them.
Figure verifies that price range is a good proxy for stock variance and volatility. The proxy function in this case is just a multiplication to a constant 180.6.
Now, let’s add two more proxies absolute return and price range(sum of H-L) . As described in previous section, absolute return is calculated as log return of the time unit. price range(sum of H-L) is calculated by adding all high low difference in 15-seconds-OHLC bars in one time unit. In my program and graphs, I use rvar for realized variance , prange for price range (H-L) , srange for price range(sum of H-L) and absr for absolute return .
Then I choose 13 time units from 2 minutes to 1 day:
Still using 15-seconds-OHLC bar data of BITA , I calculate volatility proxy for every time unit above. After getting the results, I check the statistics characteristics to verify the model \(\eqref \) .
From and , we can get the variation coefficient \(k\) :
Suppose N is large enough, if the time unit increases by m times ( \(m>1\) ), according to volatility time square root rule , we have:
This means, if the sampling interval is fixed and N is large enough, variation coefficient \(k\) of realized variance will decrease exponentially \(O(m^ )\) as length of time unit increases.
To verify this conclusion, I check the relation of variation coefficient and time units and get figure \(\ref \) :
Market Microstructure Effects on Volatility Proxies.
We can see market microstructure effects has a big impact on realized variance . When length of time unit decreases, the variation coefficient increases dramatically. Robin and Marcel proved that smaller variance corresponds to better volatility proxy. We can see the realized variance becomes stable and close to the other proxies when the time unit increases to 1.5 Hours.
For the other three proxies, there is no obvious change of variation coefficient, which means they do not suffer from market microstructure effects. Also it is well known that measurements that are log-normally distributed exhibit stationary variation coefficient , which is \(exp(\sigma^2 -1)\) , figure \(\ref \) also implies true variance is log-normally distributed.
A good proxy should have a close correlation with the original and other good proxies too. Figure displays the correlation coefficient changes with the time units. We can see the correlation of realized variance and price range increases dramatically as length of time unit increases. This means realized variance becomes a better proxy when the unit time is large enough, say 1.5 hours.
Bias and Consistency of Volatility Proxies.
3.4.7 Daily Realized Variance and Noise Process.
In previous section, we fix the length of time interval \(t\) , increase the time unit \(T\) and find that market microstructure effects has an exponential impact on realized variance . In this section, I am going to fix the time unit \(T\) as 1 day and change the length of time interval \(t\) . I will show how market microstructure noise process affects daily realized volatility when changing sampling time interval and investigate two ways to get the variance of noise process.
Still using BITA 15 seconds OHLC bar data and equation \(\eqref \) but choosing three different time intervals 15 seconds, 10 minutes and 2 hours, I get three daily realized variance time series and display them in figure .
Daily Realized Variance at Different Sampling Intervals.
In figure , rvar_1 means sampling interval is 15 seconds, rvar_40 means 10 minutes, and rvar_480 means 2 hours. We can see the trend is almost the same, but red dots(rvar_480) are distributed closer to x axis, blue dots(rvar_1) are the farthest, and green dots(rvar_40) are in between. This means when sampling interval increases, or when sampling frequency \(N\) decrease, expectation of daily realized variance decreases accordingly. This is an expected result according to equation .
Now let’s try more different sampling intervals. I choose 7 intervals as follows:
Correspondingly, the time intervals are 15 seconds, 1 minutes, 2 minutes, 5 minutes, 10 minutes, 20 minutes and 40 minutes.
Expectation of Daily Realized Variance at Different Sampling Intervals.
The x axis represents the sampling intervals and y axis represents expectation of daily realized variance, which is asymptotically equal to sample mean. We can see as sampling interval increases, which corresponds to a smaller N, the expectation of daily realized variance decreases. This is in line with equation .
When the interval is 15 seconds, N is equal to 1560 because the trading hour is 6 hours and a half. This is the highest frequency data I can get. Assume N is large enough (1.) to ignore \(E[\sigma^2]\) in and (2.) to get population expectation \(E[\sigma^2]\) , using the method proposed by Zhang , we can get that the noise process variance \(c\) equals to 7.5347758757e-07.
Alternatively, I tried to use equation too. Assuming the first two intervals \(N_1\) (1560) and \(N_2\) (390) are large enough for population expectation \(E[\sigma^2]\) , using equation , I get the noise process variance \(c\) equal to 1.30248047255e-07.
The reason why the two results are different is 15 seconds time interval is too long.
In another words, the data frequency \(N\) is not high enough to ignore \(E[\sigma^2]\) . According to the formula:
when true variance is not negligible, if one uses , one will overestimate the denominator and then overestimate the noise process variance \(c\) .
Fortunately, equation doesn’t require N is large enough to ignore \(E[\sigma ^2]\) . Assuming equation is correct applied here, \(c\) equals to 1.30248047255e-07 when \(N = 1560\) , in turn we can get expectation of true variance : \[ \begin E[\sigma ^2] &= E[\hat \sigma ^2] - 2Nc \\ &= 0.0023508500732184 - 2* 1560 * 1.30248047255e-07 \\ &= 0.00194447616578 \end \]
Both equations and require higher frequency data. But the latter only affected by accuracy of expectation calculation. With the same frequency data, equation is better because it doesn’t require \(N\) is large enough to ignore \(E[\sigma ^2]\) .
3.4.8 Three Schemes for Realized Variance Calculation.
In previous section, although we always use equation \(\eqref \) to calculate daily realized variance, we have actually used two schemes.
Scheme 1 calculates squared return for every adjacent pair of prices sequentially in one unit of time \(T\) , and then sum all squared returns. Figure illustrates how the calculation goes on. I call it classical scheme as it is exactly from equation \(\eqref \) . In previous section, I verified classical scheme is most seriously affected by market microstructure effects because high frequency data are contaminated by the noise process. When sampling frequency is high, it demonstrates a strong upward bias, making the result totally useless. In realized variance time series calculated from this scheme, you can see many spikes, which corresponds to high variation coefficient.
Classical Scheme to Calculate Realized Variance.
Scheme 2 splits one time unit into multiple grids. Grid is a new sample interval in between \(t\) and \(T\) . Scheme 2 uses only one point of data in one grid, ignoring all other data, so I call it sparse sampling scheme . In my program to generate figure and figure , I use the first price to represent price of the new sampling time interval, and calculate rvar_40 and rvar_80 . Figure illustrates how the calculation goes on.
Sparse Sampling Scheme to Calculate Realized Variance.
According to theoretical and empirical analysis in previous section, we see that sparse sampling scheme has a better performance than classical scheme . This is very surprising as it uses much less data. In figure , if one cell represents a 15-seconds-OHLC bar, we have 1560 cells for one day. If the new sampling time interval is 1 minute, according to sparse sampling , we need to throw away 1170 = 1560/43 price data. But when we use the remaining 390 price data to calculate, we get a even better result. This sounds counterintuitive but can be perfectly explained by model \(\eqref \) . Please be noted there are two intervals in sparse sampling , the original interval is 15 seconds, and the new interval after sparse sampling becomes 1 minutes. To avoid confusion, I will use word grid for the latter in the future, which is how Zhang names it in the original paper.
Can we take advantage of all data and throw away only the noise part in trade price?
Here scheme 3 comes into play. It is a natural expansion of scheme 2. It uses all data but also robust to market microstructure effects. As displayed in figure , we apply the same calculation of return, like sparse sampling , for not only the first cell in that grid, but all the other data. In figure , there are four cells in one grid. So we will get four results, the final result will be the average of them. This method is proposed by Lan Zhang(2003). I call it averaging scheme because it is improved by averaging based on sparse sampling scheme .
Averaging Scheme to Calculate Realized Variance.
In theory, averaging scheme should be better than the other two. I am going to verify this as below.
Averaging Scheme vs Classical Scheme.
Still using BITA 15-seconds-OHLC data, I get a comparison of classical scheme and averaging scheme in figure :
Classical Scheme VS Averaging Scheme.
The purple dots are realized variance result from classical scheme and the green ones from averaging scheme with grid length equal to 1 hour(24015 seconds). We can see the green dots are distributed at the bottom, closer to x axis, which corresponds to the overestimation issue of classical scheme . This proved averaging scheme is better than classical scheme .
Averaging Scheme vs Sparse Sampling Scheme.
Now let’s compare sparse sampling scheme and averaging scheme . I choose 8 grid lengths as follows.
Using two schemes to calculate daily realized variance, and then the expectation \(E[\hat \sigma^2]\) under each grid.
Display it as figure below:
Sparse Sampling Scheme VS Averaging Scheme.
We can see averaging scheme has a lower \(E[\hat \sigma^2]\) than sparse sampling scheme . This means the former suffers less from market microstructure noise, so it is better. Please be noted if grid length becomes the same as sampling time interval, sparse sampling scheme and averaging scheme are degraded to classical scheme . This is why when grid length equals to 15 seconds, the purple dot and green dot becomes the same.
We have seen averaging scheme is the best of the three schemes. We also see the grid length affects the results of averaging scheme . Let me increase grid from 15 seconds to 40 minutes and draw the realized variance time series at figure .
Averaging Scheme and Different Grid Length.
We can see the best result is the one with grid length equal to 40 minutes. We can display \(E[\hat \sigma^2]\) with grid length in figure .
Expectation of Realized Variance with Averaging Scheme and Different Grid Length.
We can see the expectation curve is a smooth convex hull. It decreases exponentially as grid Length increases. But after 20 minutes, \(E[\hat \sigma^2]\) doesn’t decrease any more. This is because if grid length is too long, we cannot use all the data any more, averaging scheme becomes more like sparse sampling scheme . For instance, when grid length is the same as time unit \(T\) , which is 1 day in our case, averaging scheme is degraded to sparse sampling scheme .
To verify this, I choose 13 grid lengths ‘30seconds’, ‘1min’, ‘2min’, ‘5min’, ‘10min’, ‘20min’, ‘40min’, ‘1H’, ‘1.25H’, ‘1.5H’, ‘1.75H’, ‘2H’, ‘2.25H’, and draw \(E[\hat \sigma^2]\) in figure .
Averaging Scheme and Different Grid Length.
Green curve is sparse sampling scheme and blue curve is averaging scheme . x axis is grid length and y axis is \(E[\hat \sigma^2]\) .
We can see, for averaging scheme , after 40mins, \(E[\hat \sigma^2]\) keep increasing in very slow speed. Also, because averaging scheme is actually an average of many equally reasonable results, it is smoother than sparse sampling scheme . After 40mins, sparse sampling scheme curve jumps up and down around averaging scheme curve. This means there is an optimal value for grid length between sampling time interval \(t\) and time unit \(T\) . In this case, it is around 40 minutes. When grid length equals to \(t\) , averaging scheme becomes classical scheme ; when it equals to \(T\) , averaging scheme becomes sparse sampling scheme .
3.4.9 True Variance and Volatility.
In previous sections, I got the variance \(c\) of noise process \(\epsilon_t\) . I also found that averaging scheme is the best way to calculate realized variance with grid length equal to 40 minutes in this case. I have reached my goal. I am ready to calculate true variance and true volatility now!
See figure for true volatility series I created using the information above.
I can also get the statistics of true variance time series. Take Logarithm of true variance and we can get the distribution at figure .
Logarithmic True Variance Distribution.
The dashed blue line is the normal distribution curve fitted with the same mean and standard deviation as above. We can see the distribution is close to normal. We know variance has properties like clustering and mean reversion, and now we know logarithm of variance is Gaussian distribution, or variance is lognormal distribution. This also supports the conclusion I get from figure that stationary variation coefficient of volatility proxies implies they are log-normally distributed.
True volatility is the square root of true variance . I checked the distribution and it is also lognormal.
Previously we use price range as a proxy of true variance . Now we can check the distribution of price range and see if it has the same distribution as true variance . Figure is the daily price range series and distribution I get from our BITA dataset.
Logarithmic Price Range Distribution.
The red dashed line is normal distribution curve fitted with corresponding mean and standard deviation. The distribution is very similar with figure . This is in line with our knowledge that price range is a good proxy for true variance .
3.4.10 Data Selection and Conclusion Generality.
To take a new nonparametric approach to calculate volatility, I need high frequency data. The data I use in this case study is BITA 15 seconds OHLC bar data from 2013-12-06 9:30AM to 2015-12-31 16:00PM . I got the data with the histData tool which I have described in section Historical Data Collection Tool - histData . There are 806,880 bars in the dataset, stored as a CSV format file named BITA_2013-12-06_2015-12-31.csv . You can download it from quant365/post/99/.
I also want to emphasize that the BITA data are picked from the database randomly. It has no special importance itself. The conclusion drawn from previous sections should also apply to other stocks.
It is noteworthy that, for two adjacent OHLC bars, close price of the first bar is not necessarily equal to open price of the second bar. When we calculate return, we have to use two bars to calculate close-to-close return. But when we calculate price range, we can use high price minus low price in the same bar.
3.5 Future Direction.
Consider relation between noise process and trading frequency in the noise process model More programming languages support Cluster for faster computing (Spark - Lightning-fast cluster computing) for Monte Carlo simulation and big matrix calculation Integration with Sentosa trading system and web platform.
4 Part IV – Sentosa Web Platform.
Initially, Sentosa web platform is a Django blog website called qblog that I developed to write trading diary, which features markdown and mathematical formula support. Later I added a sentosaapp module to monitor and debug Sentosa trading system. Finally I extended it to be able to interact with Sentosa trading system completely. It uses javascript websocket to communicate with Sentosa trading system and displays internal status at webpage using jQuery. It can also be used to send orders to Sentosa trading system.
Although this is a very important part of Sentosa, it is not directly related to any Finance knowledge so I just introduce it very briefly in one page. For more details, please check Sentosa website.
The following is the screenshot of Sentosa web platform:
Sentosa Web Platform in Backtesting Mode with Real Historical Data.
As for future development, this web platform can be extended to do online trading.
5 Reference.
Christian Brownlees, Robert Engle and Bryan Kelly, (2011), A Practical Guide to Volatility Forecasting through Calm and Storm.
Zhang, Lan, Per A. Mykland and Yacine Ait-Sahalia. “A Tale Of Two Time Scales: Determining Integrated Volatility With Noisy High-Frequency Data,” Journal of the American Statistical Association, 2005, v100(472,Dec), 1394-1411.
Alizadeh, S., Brandt, M., and Diebold, F. X. (2002). Range-based estimation of stochastic volatility models. Journal of Finance 57: 1047–1092.
Andre Christoer Andersen, Stian Mikelsen, (2012), A Novel Algorithmic Trading Framework Applying Evolution and Machine Learning for Portfolio Optimization.
Stoll, H. and Whaley, R. (1990). Stock market structure and volatility. Review of Financial Studies 3: 37–71.
Andersen, T. G. and Bollerslev, T. (1998). Answering the skeptics: Yes, standard volatility models do provide accurate forecasts. International Economic Review 39: 885–905.
Andersen, T. G., Bollerslev, T., Diebold, F. X., and Labys, P. (2001b). The distribution of realized stock return volatility. Journal of Financial Economics 61: 43–76.
Bai, X., Russell, J. R., and Tiao, G. C. (2003). Kurtosis of GARCH and stochastic volatility models with non-normal innovations. Journal of Econometrics 114: 349–360.
Barndorff-Nielsen, O. E. and Shephard, N. (2004). Power and bi-power variations with stochastic volatility and jumps (with discussion).Journal of Financial Econometrics 2: 1–48.
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31: 307–327.
Bollerslev, T. and Jubinski, D. (1999). Equality trading volume and volatility: Latent information arrivals and common long-run dependencies. Journal of Business & Economic Statistics 17: 9–21.
Bollerslev, T., Chou, R. Y., and Kroner, K. F. (1992). ARCH modeling in finance. Journal of Econometrics 52: 5–59.
Cao, C. and Tsay, R. S. (1992). Nonlinear time series analysis of stock volatilities. Journal of Applied Econometrics 7: s165–s185.
Visser, Marcel P., 2008. “Forecasting S&P 500 Daily Volatility using a Proxy for Downward Price Pressure,” MPRA Paper 11100, University Library of Munich, Germany.
Robin De Vilder & Marcel P. Visser, 2007. “Proxies for daily volatility,” PSE Working Papers halshs-00588307, HAL.
John C. Hull (2012). Options, Futures, and Other Derivatives, 8th Edition.
Ruey S. Tsay (2010). Analysis of Financial Time Series, 2nd Edition.
David Ruppert (2010). Statistics and Data Analysis for Financial Engineering, 1st Edition.
Alexios Ghalanos (2015). rugarch: Univariate GARCH models. R package version 1.3-6.

Comments