End of the World community has recently developed a data engine - Memlink , and its open source. Why is this for an open source project, its capacity and other sections of the market compared to what types of projects with the advantage, InfoQ interviewed Chinese site Tianya specifically R & D center in Beijing, Mr. Feng Yong technical director.
1. Hello, can you please introduce yourself before it? What do you have recently interesting things in it?
Hello everybody! I am the End of the World Technology Center Feng Yong, head of platform, system platform newly established Ministry of the department this year, horizon line of products designed to optimize the system architecture. End of the World is a historical site for twelve years, twelve years for a cumulative patch for reconstruction of the system, optimization, itself is a very interesting, very challenging thing.
2. What is out of mind, End of the World will develop such a data engine out? And the last to open it.
In recent years, Nosql system is very popular, but also does a reasonable additional sql system for Web applications in a variety of data solutions. But in the open source Nosql system, key-value system can choose more, and key-list/queue system can choose less, so we developed a memlink to meet our own needs.
Here, the need to emphasize the concept of a number of key-list, there are a lot of scenes in the actual needs of key-list place. For example: the theme of the forum list, reply to the list, microblogging users in the watch list, the user feed list, user list, and so concerned about the feed. If you use the key-value the value to store the list (for example: list into the value of packaged json), its operating performance is very inefficient.
Ideal Key-list usually requires the following characteristics:
- The list is massive, and efficient operating performance
- list is sorted, and can dynamically adjust the order of
As for why open source? On the one hand, many of us have benefited from the work of the existing open source system, so back the open source community is that we should do duty; the other hand, technology sharing is also conducive to the growth of technology companies themselves, and attract more technology professionals .
3. Memlink can tell us about the feature?
Memlink is a high performance, persistence, distributed, Key => List / Queue the data engine. As the name shown in the Mem, construction of all data in memory to ensure the system's performance while using the memory block chain compression, use redo-log technology to ensure data persistence. In addition, Memlink also supports master-slave replication, read and write separation, filtering features such as data entry.
- Memory data engine, performance is extremely efficient
- List Node in the block-chain organization, streamlined memory and optimize search efficiency
- Node data items can be customized Mask table, supports a variety of filtering
- Support the redo-log, data persistence, non-Cache mode
- Distributed, master-slave synchronization
- Isolated read and write, write priority.
4. We know the market there are other memory-based data engine, such as Redis and Scalaris, compared with their Memlink solve any particular problem?
Memlink in the design and development, we have carefully analyzed and compared the Redis. Ultimately did not use Redis the following four reasons:
- Redis persistence strategy (redo-log) can not completely meet the needs of the production line. For a sophisticated Internet applications should have sufficient fault tolerance. System such as system reboot, downtime, etc. without losing data. Redis a lasting strategy: timing synchronization disk (restart will be lost during this part of the data); persistence strategy two: keep an additional log, so easy to log expansion of performance. Memlink persistence Redis strategy is drawing on two strategies, to create a snapshot during the non-additional redo-log, after the completion of the snapshot cleared redo-log.
- Redis master-slave synchronization strategy is inadequate. For example: slaver as a lost cause partial synchronization data, the need to complete a master node to obtain all the data. In the case of large amounts of data, not the right line production requirements.
- Redis single-threaded mode, read and write is no separation, only use a single core. Memlink for multi-threaded, full use of multi-core, and conducted separate read and write, give priority to ensuring write.
- In memory consumption and performance Memlink better than Redis.
Memlink is a key => list / queue engine, Scalaris is the key-value, the starting point of the two different functions.
5. Memlink horizon within which the system has been used? Can provide performance changes brought about Memlink data?
Memlink types of products are mainly used in End of the World Forum (Forum, come on) in the. List of topics such as the Forum, when the data reaches millions, millions of magnitude, Mysql system using tabbed browsing, they basically can not respond, and Memlink is a hundred times more performance. Specific visible Benchmark .
6. To the majority of developers to friends who tell us how to choose a NoSQL for their products?
First need to determine the business requirements, the need for NoSQL products. For most of the millions of magnitude, the application of the order of tens of millions, MySQL can support.
Secondly, a clear need for NoSQL products, should be based on abstract data model, business needs, such as: some data is the need to use key-value system storage, some data is the need to use key-list system storage, some data is stored using the document database, etc. .
NoSQL product candidate list for the option to be considered from the following dimensions:
- System capacity, performance, hardware and software environment is consistent with the demand?
- Data security is the mechanism? Will the loss of data of various anomalies?
- With master-slave replication? What consistency of strategy?
- Scalability? Automatic extension or expansion of program?
- Controllability of the system? System maturity, the degree of support for developers, bug fixes and so on who
7. Memlink the current version number? Future development plans like?
Memlink the current version number is 0.2, with the basic key-list / master-slave replication and other functions, is currently being tested.
In the 0.3/0.4 version, Memlink will increase two-way queue, user authentication and other functions. Concrete can see Memlink the RoadMap .
In the long run, Memlink focus on a high-performance, persistence, distributed, Key => List / Queue data engine will not increase the other data storage model.